Finite Size Corrections and Likelihood Ratio Fluctuations in the Spiked Wigner Model

Ahmed El Alaoui Department of EECS, UC Berkeley, CA. Email: elalaoui@berkeley.edu    Florent Krzakala Laboratoire de Physique Statistique, CNRS, PSL Universités & Ecole Normale Supérieure, Sorbonne Universités et Université Pierre & Marie Curie, Paris, France.    Michael I. Jordan Departments of EECS and Statistics, UC Berkeley, CA.
Abstract

In this paper we study principal components analysis in the regime of high dimensionality and high noise. Our model of the problem is a rank-one deformation of a Wigner matrix where the signal-to-noise ratio (SNR) is of constant order, and we are interested in the fundamental limits of detection of the spike. Our main goal is to gain a fine understanding of the asymptotics for the log-likelihood ratio process, also known as the free energy, as a function of the SNR. Our main results are twofold. We first prove that the free energy has a finite-size correction to its limit—the replica-symmetric formula—which we explicitly compute. This provides a formula for the Kullback-Leibler divergence between the planted and null models. Second, we prove that below the reconstruction threshold, where it becomes impossible to reconstruct the spike, the log-likelihood ratio has fluctuations of constant order and converges in distribution to a Gaussian under both the planted and (under restrictions) the null model. As a consequence, we provide a general proof of contiguity between these two distributions that holds up to the reconstruction threshold, and is valid for an arbitrary separable prior on the spike. Formulae for the total variation distance, and the Type-I and Type-II errors of the optimal test are also given. Our proofs are based on Gaussian interpolation methods and a rigorous incarnation of the cavity method, as devised by Guerra and Talagrand in their study of the Sherrington-Kirkpatrick spin-glass model.

1 Introduction

Spiked models, which are distributions over matrices of the form “signal + noise,” have been a mainstay in the statistical literature since their introduction by Johnstone, (2001) as models for the study of high-dimensional principal component analysis that are tractable yet realistic. Spectral properties of these models have been extensively studied, in particular in random matrix theory, where they are known as deformed ensembles (Péché, , 2014). Landmark investigations in this area (Baik et al., , 2005; Baik and Silverstein, , 2006; Péché, , 2006; Féral and Péché, , 2007; Capitaine et al., , 2009) have established the existence of a spectral threshold above which the top eigenvalue detaches from the bulk of eigenvalues and becomes informative about the spike, and below which the top eigenvalue bears no information. Estimation using the top eigenvector undergoes the same transition, where it is known to “lose track” of the spike below the spectral threshold (Paul, , 2007; Nadler, , 2008; Johnstone and Lu, , 2009; Benaych-Georges and Nadakuditi, , 2011). Although these spectral analyses have provided many insights, as have analyses based on more thoroughgoing usage of spectral data and/or more advanced optimization-based procedures (see Amini and Wainwright, , 2009; Berthet and Rigollet, , 2013; Onatski et al., , 2013, 2014; Dobriban, , 2017, and references therein), they stop short of characterizing the fundamental limits of estimating the spike, or detecting its presence from the observation of a sample matrix. These questions, information-theoretic and statistical in nature, are more naturally approached by looking at objects such as the posterior law of spike and the associated likelihood ratio process.

The main approach to date to the challenging problem of controlling the likelihood ratio is via the second moment method. Controlling the second moment enables one to show contiguity, in the sense of Le Cam, (1960), between the planted and null models and thus declare impossibility of strong detection—i.e., the impossibility of vanishing Type-I and Type-II errors of any given test—in the region where this second moment is bounded (Banks et al., , 2017; Perry et al., , 2016). This method is known, however, to require careful conditioning and truncation due to the existence of rare but catastrophic events under which the likelihood ratio becomes exponentially large. These events thus dominate the second moment, although they are virtually irrelevant to the detection task. Moreover, even after conditioning the method may fail in identifying the detection thresholds, depending on the structure of the spike. Furthermore, contiguity has little or no bearing on the problem of weak detection: When errors are inevitable, what is the smallest error achievable by any test?

Motivated by a desire to overcome these limitations, we consider a particularly simple spiked model—the rank-one spiked Wigner model—and provide an alternative approach to the detection problem that obviates the use of the second moment method altogether. This is achieved by obtaining asymptotic distributional results for the log-likelihood ratio process, then appealing to standard results from the theory of statistical experiments. We are thereby able to provide solutions to both the strong and weak variants of the detection problem. To study the likelihood ratio in this setting we build on the technology developed by Aizenman, Guerra, Panchenko, Talagrand, and many others, in their study of the Sherrington-Kirkpatrick (SK) spin glass-model. Specifically, we make use of Gaussian interpolation methods and Talagrand’s cavity method.

1.1 Setup and summary of the results

In the spiked Wigner model, one observes a rank-one deformation of a Wigner matrix 𝑾𝑾{\bm{W}}:

𝒀=λN𝒙𝒙+𝑾,𝒀𝜆𝑁superscript𝒙superscript𝒙absenttop𝑾\bm{Y}=\sqrt{\frac{\lambda}{N}}\bm{x}^{*}\bm{x}^{*\top}+\bm{W}, (1)

where Wij=Wji𝒩(0,1)subscript𝑊𝑖𝑗subscript𝑊𝑗𝑖similar-to𝒩01W_{ij}=W_{ji}\sim\mathcal{N}(0,1) and Wii𝒩(0,2)similar-tosubscript𝑊𝑖𝑖𝒩02W_{ii}\sim\mathcal{N}(0,2) are independent for all 1ijN1𝑖𝑗𝑁1\leq i\leq j\leq N. The spike vector 𝒙Nsuperscript𝒙superscript𝑁\bm{x}^{*}\in\mathbb{R}^{N} represents the signal to be recovered, or its presence detected. We assume that the entries xisubscriptsuperscript𝑥𝑖x^{*}_{i} of the spike are drawn i.i.d. from a prior distribution Pxsubscript𝑃xP_{\textup{{x}}} on \mathbb{R} having bounded support. The parameter λ0𝜆0\lambda\geq 0 plays the role of the signal-to-noise ratio, and the scaling by N𝑁\sqrt{N} is such that the signal and noise components of the observed data are of comparable magnitudes. This places the problem in a high-noise regime where consistency is not possible but partial recovery still is. As a matter of convenience, we discard the diagonal terms Yiisubscript𝑌𝑖𝑖Y_{ii} from the observations. (Adding the diagonal back does not pose any additional technical difficulties, and our results can be straightforwardly extended to this case.) We endow the real line with the Borel σ𝜎\sigma-algebra and define Pxsubscript𝑃xP_{\textup{{x}}} on it. We denote by λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} the joint probability law of the observations 𝒀={Yij:1i<jN}𝒀conditional-setsubscript𝑌𝑖𝑗1𝑖𝑗𝑁\bm{Y}=\{Y_{ij}:1\leq i<j\leq N\} as per (1) and define the likelihood ratio

L(𝒀;λ):=dλd0(𝒀).assign𝐿𝒀𝜆dsubscript𝜆dsubscript0𝒀L(\bm{Y};\lambda):=\frac{\mathrm{d}\mathbb{P}_{\lambda}}{\mathrm{d}\mathbb{P}_{0}}(\bm{Y}). (2)

A simple computation based on conditioning on 𝒙superscript𝒙\bm{x}^{*} reveals that

L(𝒀;λ)=exp(λNi<jYijxixjλ2Ni<jxi2xj2)dPxN(𝒙).𝐿𝒀𝜆𝜆𝑁subscript𝑖𝑗subscript𝑌𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗𝜆2𝑁subscript𝑖𝑗superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑗2differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙L(\bm{Y};\lambda)=\int\exp\Big{(}\sqrt{\frac{\lambda}{N}}\sum_{i<j}Y_{ij}x_{i}x_{j}-\frac{\lambda}{2N}\sum_{i<j}x_{i}^{2}x_{j}^{2}\Big{)}~{}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}). (3)

We define the free energy (density) associated with the model λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} to be

FN:=1N𝔼λlogL(𝒀;λ).assignsubscript𝐹𝑁1𝑁subscript𝔼subscript𝜆𝐿𝒀𝜆F_{N}:=\frac{1}{N}\operatorname{\mathbb{E}}_{\operatorname{\mathbb{P}}_{\lambda}}\log L(\bm{Y};\lambda). (4)

We see that FN=1ND𝖪𝖫(λ,0)subscript𝐹𝑁1𝑁subscript𝐷𝖪𝖫subscript𝜆subscript0F_{N}=\frac{1}{N}D_{\mathsf{KL}}(\operatorname{\mathbb{P}}_{\lambda},\operatorname{\mathbb{P}}_{0}), where D𝖪𝖫subscript𝐷𝖪𝖫D_{\mathsf{KL}} is the Kullback-Leibler divergence between probability measures. (The free energy is usually defined differently in the literature as the log-normalizing constant in the posterior of 𝒙superscript𝒙\bm{x}^{*} given 𝒀𝒀\bm{Y}. The two definitions are strictly equivalent.) It was initially argued via heuristic replica and cavity computations (Lesieur et al., , 2015, 2017) that FNsubscript𝐹𝑁F_{N} converges to a limit ϕ𝖱𝖲(λ)subscriptitalic-ϕ𝖱𝖲𝜆\phi_{\mathsf{RS}}(\lambda), which is referred to as the replica-symmetric formula. This formula, variational in nature, encodes in principle a full characterization of the limits of estimating the spike with non-trivial accuracy. Indeed, various formulae for other information-theoretic quantities can be deduced from it, including the mutual information between 𝒙superscript𝒙\bm{x}^{*} and 𝒀𝒀\bm{Y}, the minimal mean squared error of estimating 𝒙superscript𝒙\bm{x}^{*} based on 𝒀𝒀\bm{Y}, and the overlap |𝒙𝒙|/Nsuperscript𝒙topsuperscript𝒙𝑁|\bm{x}^{\top}\bm{x}^{*}|/N of a draw 𝒙𝒙\bm{x} from the posterior λ(|𝒀)\operatorname{\mathbb{P}}_{\lambda}(\cdot|\bm{Y}) with the spike 𝒙superscript𝒙\bm{x}^{*}. Most of these claims have subsequently been proved rigorously in a series of papers (Deshpande and Montanari, , 2014; Deshpande et al., , 2016; Barbier et al., , 2016; Krzakala et al., , 2016; Lelarge and Miolane, , 2016) under various assumptions on the prior. However, these results stop short of providing explicit characterizations of thresholds for the detection problem.

The main goal of this paper is to gain a more refined understanding of the asymptotic behavior of the log-likelihood ratio logL(𝒀;λ)𝐿𝒀𝜆\log L(\bm{Y};\lambda), and its mean NFN𝑁subscript𝐹𝑁NF_{N}, under λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} as N𝑁N becomes large. We first determine the finite-size correction of FNsubscript𝐹𝑁F_{N} to its limit ϕ𝖱𝖲(λ)subscriptitalic-ϕ𝖱𝖲𝜆\phi_{\mathsf{RS}}(\lambda): we prove (under conditions on Pxsubscript𝑃xP_{\textup{{x}}}) that N(FNϕ𝖱𝖲(λ))𝑁subscript𝐹𝑁subscriptitalic-ϕ𝖱𝖲𝜆N(F_{N}-\phi_{\mathsf{RS}}(\lambda)) converges to a limit ψ𝖱𝖲(λ)subscript𝜓𝖱𝖲𝜆\psi_{\mathsf{RS}}(\lambda) with rate 𝒪(1/N)𝒪1𝑁\mathcal{O}(1/\sqrt{N}). Besides providing an explicit rate of convergence of FNsubscript𝐹𝑁F_{N} to its limit, this result translates into a formula for the Kullback-Leibler divergence D𝖪𝖫subscript𝐷𝖪𝖫D_{\mathsf{KL}}, which is particularly interesting below the reconstruction threshold: we will see that in this regime ϕ𝖱𝖲(λ)=0subscriptitalic-ϕ𝖱𝖲𝜆0\phi_{\mathsf{RS}}(\lambda)=0, so D𝖪𝖫subscript𝐷𝖪𝖫D_{\mathsf{KL}} ceases to be extensive in the size of the system and converges to a finite value ψ𝖱𝖲(λ)subscript𝜓𝖱𝖲𝜆\psi_{\mathsf{RS}}(\lambda).

Second, we prove that in this same regime, the log-likelihood ratio logL(𝒀;λ)𝐿𝒀𝜆\log L(\bm{Y};\lambda) has fluctuations of constant order under λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda}, and converges asymptotically to a Gaussian with a mean equal to half the variance. This allows us to provide an alternative proof of contiguity between λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} and 0subscript0\operatorname{\mathbb{P}}_{0}, valid in the entire regime where contiguity can possibly hold, as well as a formula for the Type-II error for testing between these two distributions.

Under the null distribution 0subscript0\operatorname{\mathbb{P}}_{0} on the other hand, the model is equivalent to the widely studied Sherrington-Kirkpatrick model (provided that Px=12δ1+12δ+1subscript𝑃x12subscript𝛿112subscript𝛿1P_{\textup{{x}}}=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{+1}). In one of the first rigorous results on this model, Aizenman et al., (1987) proved that in the high-temperature regime and in the absence of an external field, the fluctuations of the log-partition function of the model about its mean, which is given by the “annealed” computation, are asymptotically Gaussian with explicit mean and variance. By mapping their result into our setting, we obtain the fluctuations of logL(𝒀;λ)𝐿𝒀𝜆\log L(\bm{Y};\lambda) under 0subscript0\operatorname{\mathbb{P}}_{0} in the non-reconstruction phase, as well as a formula for the Type-I error. Although we only obtain this last formula for the Rademacher prior, we conjecture its validity for arbitrary priors. An interesting symmetry emerges from these results: the limiting Gaussians under λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} and 0subscript0\operatorname{\mathbb{P}}_{0} have means of equal magnitude and opposite signs, and equal variances. This symmetry causes the Type-I and Type-II errors to be equal. Adding up the two latter quantities, we obtain a formula for the total variation distance D𝖳𝖵(λ,0)subscript𝐷𝖳𝖵subscript𝜆subscript0D_{\mathsf{TV}}(\operatorname{\mathbb{P}}_{\lambda},\operatorname{\mathbb{P}}_{0}).

Our results are in the spirit of those of Onatski et al., (2013, 2014), who studied the likelihood ratio of the joint eigenvalue densities under the spiked covariance model with a sphericity prior, and showed its asymptotic normality below the spectral threshold. Their results thus pertain to eigenvalue-based tests while ours are for arbitrary tests, albeit for a simpler model. Our results show in particular that it is still possible to distinguish the planted model λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} from the null model 0subscript0\operatorname{\mathbb{P}}_{0} with non-vanishing probability below the reconstruction threshold; i.e., even when estimation of the spike 𝒙superscript𝒙\bm{x}^{*} becomes impossible. Performing such a test in practice of course hinges on the computational problem of efficiently computing the likelihood ratio; we leave open this question of constructing computationally efficient tests in the non-reconstruction phase.111If one observes the diagonal, then one can test using the trace of 𝒀𝒀\bm{Y}. This test would still however be suboptimal.

1.2 Background

The 𝖱𝖲𝖱𝖲\mathsf{RS} formula.

For r0𝑟0r\geq 0, consider the function

ψ(r):=𝔼x,zlogexp(rzx+rxxr2x2)dPx(x),assign𝜓𝑟subscript𝔼superscript𝑥𝑧𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝑃x𝑥\psi(r):=\operatorname{\mathbb{E}}_{x^{*},z}\log\int\exp\left(\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}\right)\mathrm{d}P_{\textup{{x}}}(x), (5)

where z𝒩(0,1)similar-to𝑧𝒩01z\sim\mathcal{N}(0,1), and xPxsimilar-tosuperscript𝑥subscript𝑃xx^{*}\sim P_{\textup{{x}}}. This is the 𝖪𝖫𝖪𝖫\mathsf{KL} divergence between the distributions of the random variables y=rx+z𝑦𝑟superscript𝑥𝑧y=\sqrt{r}x^{*}+z and z𝑧z. We define the Replica-Symmetric (𝖱𝖲𝖱𝖲\mathsf{RS}) potential

F(λ,q):=ψ(λq)λq24,assign𝐹𝜆𝑞𝜓𝜆𝑞𝜆superscript𝑞24F(\lambda,q):=\psi(\lambda q)-\frac{\lambda q^{2}}{4}, (6)

and finally define the 𝖱𝖲𝖱𝖲\mathsf{RS} formula

ϕ𝖱𝖲(λ):=supq0F(λ,q).assignsubscriptitalic-ϕ𝖱𝖲𝜆subscriptsupremum𝑞0𝐹𝜆𝑞\phi_{\mathsf{RS}}(\lambda):=\sup_{q\geq 0}~{}F(\lambda,q). (7)

A central result in this context is that free energy FNsubscript𝐹𝑁F_{N} converges to the 𝖱𝖲𝖱𝖲\mathsf{RS} formula for all λ0𝜆0\lambda\geq 0  (Lesieur et al., , 2015, 2017; Deshpande et al., , 2016; Barbier et al., , 2016; Krzakala et al., , 2016; Lelarge and Miolane, , 2016):

FNϕ𝖱𝖲(λ).subscript𝐹𝑁subscriptitalic-ϕ𝖱𝖲𝜆F_{N}~{}~{}\longrightarrow~{}~{}\phi_{\mathsf{RS}}(\lambda).

The values of q𝑞q that maximize the 𝖱𝖲𝖱𝖲\mathsf{RS} potential and their properties play an important role in the theory. Lelarge and Miolane, (2016) proved that the map qF(λ,q)maps-to𝑞𝐹𝜆𝑞q\mapsto F(\lambda,q) has a unique maximizer q=q(λ)superscript𝑞superscript𝑞𝜆q^{*}=q^{*}(\lambda) for all λ𝒟𝜆𝒟\lambda\in\mathcal{D} where 𝒟𝒟\mathcal{D} is the set of points where the function λϕ𝖱𝖲(λ)maps-to𝜆subscriptitalic-ϕ𝖱𝖲𝜆\lambda\mapsto\phi_{\mathsf{RS}}(\lambda) is differentiable. By convexity of ϕ𝖱𝖲subscriptitalic-ϕ𝖱𝖲\phi_{\mathsf{RS}} (see next section), 𝒟=+countable set𝒟subscriptcountable set\mathcal{D}=\mathbb{R}_{+}\setminus\mbox{countable set}. Moreover, they showed that the map λ𝒟q(λ)𝜆𝒟maps-tosuperscript𝑞𝜆\lambda\in\mathcal{D}\mapsto q^{*}(\lambda) is non-decreasing, and

limλ0λ𝒟q(λ)=𝔼Px[X]2,andlimλλ𝒟q(λ)=𝔼Px[X2].\lim_{\underset{\lambda\in\mathcal{D}}{\lambda\to 0}}q^{*}(\lambda)=\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X]^{2},\qquad\text{and}\qquad\lim_{\underset{\lambda\in\mathcal{D}}{\lambda\to\infty}}q^{*}(\lambda)=\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}]. (8)

One should interpret the value q(λ)superscript𝑞𝜆q^{*}(\lambda) as the best overlap an estimator θ^(𝒀)^𝜃𝒀\widehat{\theta}(\bm{Y}) based on observing 𝒀𝒀\bm{Y} can have with the spike 𝒙superscript𝒙\bm{x}^{*}. Indeed, the overlap |𝒙𝒙|/Nsuperscript𝒙topsuperscript𝒙𝑁\big{|}\bm{x}^{\top}\bm{x}^{*}\big{|}/N between the spike 𝒙superscript𝒙\bm{x}^{*} and a random draw 𝒙𝒙\bm{x} from the posterior λ(|𝒀)\operatorname{\mathbb{P}}_{\lambda}(\cdot|\bm{Y}) should concentrate in the large N𝑁N limit about q(λ)superscript𝑞𝜆q^{*}(\lambda) (hence the name “replica-symmetry”). A matrix variant of this result (where one estimates 𝒙𝒙superscript𝒙superscript𝒙absenttop\bm{x}^{*}\bm{x}^{*\top}) was proved in (Lelarge and Miolane, , 2016). In Section 3, we prove strong (vector) versions of this result where under mild assumptions, optimal rates of convergence are given.

The reconstruction threshold.

The first limit in (8) shows that when the prior Pxsubscript𝑃xP_{\textup{{x}}} is not centered, it is always possible to have a non-trivial overlap with 𝒙superscript𝒙\bm{x}^{*} for any λ>0𝜆0\lambda>0. On the other hand, when the prior has zero mean, and since qsuperscript𝑞q^{*} is a non-decreasing function of λ𝜆\lambda, it is useful to define the critical value of λ𝜆\lambda below which estimating 𝒙superscript𝒙\bm{x}^{*} becomes impossible:

λc:=sup{λ>0:q(λ)=0}.assignsubscript𝜆𝑐supremumconditional-set𝜆0superscript𝑞𝜆0\lambda_{c}:=\sup\big{\{}\lambda>0~{}:~{}q^{*}(\lambda)=0\big{\}}. (9)

We refer to λcsubscript𝜆𝑐\lambda_{c} as the critical or reconstruction threshold. The next lemma establishes a natural bound on λcsubscript𝜆𝑐\lambda_{c}.

Lemma 1.

We have

λc(𝔼Px[X2])21.subscript𝜆𝑐superscriptsubscript𝔼subscript𝑃xsuperscript𝑋221\lambda_{c}\cdot\left(\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}]\right)^{2}\leq 1. (10)
Proof.

Indeed, assume that Pxsubscript𝑃xP_{\textup{{x}}} is centered, and let λ>(𝔼[X2])2𝜆superscript𝔼superscript𝑋22\lambda>(\operatorname{\mathbb{E}}[X^{2}])^{-2}. Since ψ(0)=12𝔼Px[X]2=0\psi^{\prime}(0)=\frac{1}{2}\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X]^{2}=0 and ψ′′(0)=12(𝔼Px[X2])2superscript𝜓′′012superscriptsubscript𝔼subscript𝑃xsuperscript𝑋22\psi^{\prime\prime}(0)=\frac{1}{2}(\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}])^{2}, we see that qF(λ,0)=0subscript𝑞𝐹𝜆00\partial_{q}F(\lambda,0)=0 and q2F(λ,0)=λ2(λ𝔼Px[X2]21)>0\partial_{q}^{2}F(\lambda,0)=\frac{\lambda}{2}(\lambda\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}]^{2}-1)>0. So q=0𝑞0q=0 cannot be a maximizer of F(λ,)𝐹𝜆F(\lambda,\cdot). Therefore q(λ)>0superscript𝑞𝜆0q^{*}(\lambda)>0 and λλc𝜆subscript𝜆𝑐\lambda\geq\lambda_{c}. \blacksquare

The importance of Lemma 1 stems from the fact that the value (𝔼Px[X2])2superscriptsubscript𝔼subscript𝑃xsuperscript𝑋22\left(\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}]\right)^{-2} is the spectral threshold previously discussed. Above this value, the first eigenvalue of the matrix 𝒀𝒀\bm{Y} leaves the bulk, and is at the edge of the bulk below it (Péché, , 2006; Capitaine et al., , 2009; Féral and Péché, , 2007). This value also marks the limit below which the first eigenvector of 𝒀𝒀\bm{Y} captures no information about the spike 𝒙superscript𝒙\bm{x}^{*} (Benaych-Georges and Nadakuditi, , 2011). Inequality (10) can be strict or turn into equality depending on the prior Pxsubscript𝑃xP_{\textup{{x}}}. For instance, there is equality if the prior is Gaussian or Rademacher—so that the first eigenvector overlaps with the spike as soon as estimation becomes possible at all—and strict inequality in the case of the (sufficiently) sparse Rademacher prior Px=ρ2δ1/ρ+(1ρ)δ0+ρ2δ+1/ρsubscript𝑃x𝜌2subscript𝛿1𝜌1𝜌subscript𝛿0𝜌2subscript𝛿1𝜌P_{\textup{{x}}}=\frac{\rho}{2}\delta_{-1/\sqrt{\rho}}+(1-\rho)\delta_{0}+\frac{\rho}{2}\delta_{+1/\sqrt{\rho}}. More precisely, there exists a value

ρ=inf{ρ(0,1):ψ′′′(0)<0}0.092,superscript𝜌infimumconditional-set𝜌01superscript𝜓′′′000.092\rho^{*}=\inf\big{\{}\rho\in(0,1)~{}:~{}\psi^{\prime\prime\prime}(0)<0\big{\}}\approx 0.092,

such that λc=1subscript𝜆𝑐1\lambda_{c}=1 for ρρ𝜌superscript𝜌\rho\geq\rho^{*}, and λc<1subscript𝜆𝑐1\lambda_{c}<1 for ρ<ρ𝜌superscript𝜌\rho<\rho^{*}. In the latter case, the spectral approach to estimating 𝒙superscript𝒙\bm{x}^{*} fails for λ(λc,1)𝜆subscript𝜆𝑐1\lambda\in(\lambda_{c},1), and it is believed that no polynomial time algorithm succeeds in this region (Lesieur et al., , 2015; Krzakala et al., , 2016; Banks et al., , 2017).

2 Main results

2.1 Finite size corrections to the 𝖱𝖲𝖱𝖲\mathsf{RS} formula

The results we are about to present hold in a possibly slightly smaller set than 𝒟𝒟\mathcal{D}. While uniqueness of qsuperscript𝑞q^{*} only needs first differentiability of the 𝖱𝖲𝖱𝖲\mathsf{RS} formula, our results need a second derivative to exist. In physics parlance, our results do not hold at values of λ𝜆\lambda at which a particular kind of first-order phase transition occurs, namely, one in which the order parameter qsuperscript𝑞q^{*} is not differentiable. The presence of these transitions depends again on the prior Pxsubscript𝑃xP_{\textup{{x}}}. For the Gaussian and Rademacher prior, there are no such transitions, while for the sparse Rademacher prior discussed above, there is one first-order transition where qsuperscript𝑞superscriptq^{*^{\prime}} is not defined for every ρ<ρ𝜌superscript𝜌\rho<\rho^{*}. Thus we define the set

𝒜={λ>0:ϕ𝖱𝖲 is twice differentiable at λ.}.\mathcal{A}=\big{\{}\lambda>0~{}:~{}\phi_{\mathsf{RS}}\mbox{ is twice differentiable at }\lambda.\big{\}}.

Since ϕ𝖱𝖲subscriptitalic-ϕ𝖱𝖲\phi_{\mathsf{RS}} is the point-wise limit of a sequence (FN)subscript𝐹𝑁(F_{N}) of convex functions, it is also convex. Then by Alexandrov’s theorem (Aleskandrov, , 1939), the set 𝒜𝒜\mathcal{A} is of full Lebesgue measure in +subscript\mathbb{R}_{+} (cf. 𝒟=+countable set𝒟subscriptcountable set\mathcal{D}=\mathbb{R}_{+}\setminus\mbox{countable set}.) Moreover, we can see that (0,λc)𝒜0subscript𝜆𝑐𝒜(0,\lambda_{c})\subset\mathcal{A}, since if λ𝒜(0,λc)𝜆𝒜0subscript𝜆𝑐\lambda\in\mathcal{A}\cap(0,\lambda_{c}), we have q(λ)=0superscript𝑞𝜆0q^{*}(\lambda)=0, therefore ϕ𝖱𝖲(λ)=0subscriptitalic-ϕ𝖱𝖲𝜆0\phi_{\mathsf{RS}}(\lambda)=0. By continuity, ϕ𝖱𝖲subscriptitalic-ϕ𝖱𝖲\phi_{\mathsf{RS}} vanishes on the entire interval (0,λc)0subscript𝜆𝑐(0,\lambda_{c}). Our first main result is to establish the existence of a function λψ𝖱𝖲(λ)maps-to𝜆subscript𝜓𝖱𝖲𝜆\lambda\mapsto\psi_{\mathsf{RS}}(\lambda) defined on 𝒜𝒜\mathcal{A} such that either below λcsubscript𝜆𝑐\lambda_{c} or above it when the prior Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric about the origin, we have

N(FNϕ𝖱𝖲(λ))ψ𝖱𝖲(λ).𝑁subscript𝐹𝑁subscriptitalic-ϕ𝖱𝖲𝜆subscript𝜓𝖱𝖲𝜆N(F_{N}-\phi_{\mathsf{RS}}(\lambda))\longrightarrow\psi_{\mathsf{RS}}(\lambda).

An explicit formula for ψ𝖱𝖲subscript𝜓𝖱𝖲\psi_{\mathsf{RS}} will be given. But first we need to introduce some notation. Let λ𝒜𝜆𝒜\lambda\in\mathcal{A} and consider the quantities

a(0)=𝔼[x2r2]q2(λ),a(1)=𝔼[x2rxr2]q2(λ),a(2)=𝔼[xr4]q2(λ),formulae-sequence𝑎0𝔼superscriptsubscriptdelimited-⟨⟩superscript𝑥2𝑟2superscript𝑞absent2𝜆formulae-sequence𝑎1𝔼subscriptdelimited-⟨⟩superscript𝑥2𝑟superscriptsubscriptdelimited-⟨⟩𝑥𝑟2superscript𝑞absent2𝜆𝑎2𝔼superscriptsubscriptdelimited-⟨⟩𝑥𝑟4superscript𝑞absent2𝜆\displaystyle a(0)=\operatorname{\mathbb{E}}\left[\langle x^{2}\rangle_{r}^{2}\right]-q^{*2}(\lambda),\quad a(1)=\operatorname{\mathbb{E}}\left[\langle x^{2}\rangle_{r}\langle x\rangle_{r}^{2}\right]-q^{*2}(\lambda),\quad a(2)=\operatorname{\mathbb{E}}\left[\langle x\rangle_{r}^{4}\right]-q^{*2}(\lambda), (11)

where

r=exp(rzx+rxxr2x2)dPx(x)exp(rzx+rxxr2x2)dPx(x),subscriptdelimited-⟨⟩𝑟𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2dsubscript𝑃x𝑥𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝑃x𝑥\langle\cdot\rangle_{r}=\frac{\int\cdot\exp\left(\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}\right)\mathrm{d}P_{\textup{{x}}}(x)}{\int\exp\left(\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}\right)\mathrm{d}P_{\textup{{x}}}(x)},

with r=λq(λ)𝑟𝜆superscript𝑞𝜆r=\lambda q^{*}(\lambda) and the expectation operator 𝔼𝔼\operatorname{\mathbb{E}} is w.r.t. xPxsimilar-tosuperscript𝑥subscript𝑃xx^{*}\sim P_{\textup{{x}}} and z𝒩(0,1)similar-to𝑧𝒩01z\sim\mathcal{N}(0,1). The Gibbs measure rsubscriptdelimited-⟨⟩𝑟\langle\cdot\rangle_{r} can be interpreted as the posterior distribution of xsuperscript𝑥x^{*} given the observation y=rx+z𝑦𝑟superscript𝑥𝑧y=\sqrt{r}x^{*}+z. (More on this point of view in Section 3.) Now let

μ1(λ)=λ(a(0)2a(1)+a(2)),μ2(λ)=λ(a(0)3a(1)+2a(2)),subscript𝜇1𝜆absent𝜆𝑎02𝑎1𝑎2subscript𝜇2𝜆absent𝜆𝑎03𝑎12𝑎2\displaystyle\begin{aligned} \mu_{1}(\lambda)&=\lambda(a(0)-2a(1)+a(2)),\\ \mu_{2}(\lambda)&=\lambda(a(0)-3a(1)+2a(2)),\end{aligned} (12)

and finally define

ψ𝖱𝖲(λ):=14(log(1μ1)2log(1μ2)+λ4a(1)3a(2)1μ1λa(0)).assignsubscript𝜓𝖱𝖲𝜆141subscript𝜇121subscript𝜇2𝜆4𝑎13𝑎21subscript𝜇1𝜆𝑎0\displaystyle\psi_{\mathsf{RS}}(\lambda):=\frac{1}{4}\left(\log(1-\mu_{1})-2\log(1-\mu_{2})+\lambda\frac{4a(1)-3a(2)}{1-\mu_{1}}-\lambda a(0)\right). (13)

We will prove (Lemma 18) that μ2μ1<1subscript𝜇2subscript𝜇11\mu_{2}\leq\mu_{1}<1 for all λ𝒜𝜆𝒜\lambda\in\mathcal{A} so that this function is well defined on 𝒜𝒜\mathcal{A}.

Theorem 2.

For λ𝒜𝜆𝒜\lambda\in\mathcal{A}, if either λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c}, or λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c} and the prior Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric about the origin, then

N(FNϕ𝖱𝖲(λ))=ψ𝖱𝖲(λ)+𝒪(1N),𝑁subscript𝐹𝑁subscriptitalic-ϕ𝖱𝖲𝜆subscript𝜓𝖱𝖲𝜆𝒪1𝑁N\big{(}F_{N}-\phi_{\mathsf{RS}}(\lambda)\big{)}=\psi_{\mathsf{RS}}(\lambda)+\mathcal{O}\Big{(}\frac{1}{\sqrt{N}}\Big{)},

or equivalently, D𝖪𝖫(λ,0)=Nϕ𝖱𝖲(λ)+ψ𝖱𝖲(λ)+𝒪(1/N)subscript𝐷𝖪𝖫subscript𝜆subscript0𝑁subscriptitalic-ϕ𝖱𝖲𝜆subscript𝜓𝖱𝖲𝜆𝒪1𝑁D_{\mathsf{KL}}(\operatorname{\mathbb{P}}_{\lambda},\operatorname{\mathbb{P}}_{0})=N\phi_{\mathsf{RS}}(\lambda)+\psi_{\mathsf{RS}}(\lambda)+\mathcal{O}(1/\sqrt{N}).

The theorem asserts that either below the reconstruction threshold, or above it when the prior Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric, the free energy FNsubscript𝐹𝑁F_{N} has a finite-size correction of order 1/N1𝑁1/N to its limit ϕ𝖱𝖲subscriptitalic-ϕ𝖱𝖲\phi_{\mathsf{RS}} and a subsequent term of order N3/2superscript𝑁32N^{-3/2} in the expansion. In the case λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c} with symmetric prior, the problem is invariant under a sign flip of the spike, so the overlap 𝒙𝒙/Nsuperscript𝒙topsuperscript𝒙𝑁\bm{x}^{\top}\bm{x}^{*}/N has a symmetric distribution, and hence concentrates equiprobably about two distinct values ±q(λ)plus-or-minussuperscript𝑞𝜆\pm q^{*}(\lambda). Our techniques do not survive this symmetry, and resolving this case seems to require a new approach.

We see that D𝖪𝖫(λ,0)subscript𝐷𝖪𝖫subscript𝜆subscript0D_{\mathsf{KL}}(\operatorname{\mathbb{P}}_{\lambda},\operatorname{\mathbb{P}}_{0}) is an extensive quantity in N𝑁N whenever ϕ𝖱𝖲(λ)>0subscriptitalic-ϕ𝖱𝖲𝜆0\phi_{\mathsf{RS}}(\lambda)>0, or equivalently, λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}. On the other hand, this 𝖪𝖫𝖪𝖫\mathsf{KL} is of constant order below λcsubscript𝜆𝑐\lambda_{c}:

Centered prior.

Let us consider the case where the prior Pxsubscript𝑃xP_{\textup{{x}}} has zero mean, and unit variance (the latter can be assumed without loss of generality by rescaling λ𝜆\lambda), so that Lemma 1 reads λc1subscript𝜆𝑐1\lambda_{c}\leq 1. If λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c}, we have q(λ)=0superscript𝑞𝜆0q^{*}(\lambda)=0, ϕ𝖱𝖲(λ)=0subscriptitalic-ϕ𝖱𝖲𝜆0\phi_{\mathsf{RS}}(\lambda)=0, and one can check that in this case

a(0)=(𝔼Px[X2])2=1,a(1)=𝔼Px[X2]𝔼Px[X]2=0,a(2)=𝔼Px[X]4=0.a(0)=(\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}])^{2}=1,\quad a(1)=\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}]\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X]^{2}=0,\quad a(2)=\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X]^{4}=0.

Therefore, expression (13) simplifies to

ψ𝖱𝖲(λ)=14(log(1λ)λ).subscript𝜓𝖱𝖲𝜆141𝜆𝜆\psi_{\mathsf{RS}}(\lambda)=\frac{1}{4}\left(-\log\left(1-\lambda\right)-\lambda\right).

By the above calculation, we have a formula for the 𝖪𝖫𝖪𝖫\mathsf{KL} divergence between λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} and 0subscript0\operatorname{\mathbb{P}}_{0} below the reconstruction threshold λcsubscript𝜆𝑐\lambda_{c} (see plot in Figure 1):

Corollary 3.

Assume the prior Pxsubscript𝑃xP_{\textup{{x}}} is centered and of unit variance. Then for all λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c},

D𝖪𝖫(λ,0)=14(log(1λ)λ)+𝒪(1N).subscript𝐷𝖪𝖫subscript𝜆subscript0141𝜆𝜆𝒪1𝑁D_{\mathsf{KL}}(\operatorname{\mathbb{P}}_{\lambda},\operatorname{\mathbb{P}}_{0})=\frac{1}{4}\left(-\log\left(1-\lambda\right)-\lambda\right)+\mathcal{O}\Big{(}\frac{1}{\sqrt{N}}\Big{)}. (14)
More information on ψ𝖱𝖲subscript𝜓𝖱𝖲\psi_{\mathsf{RS}}.

Expression (13) looks mysterious at first sight. Let us briefly explain its origin. A slightly less processed expression for ψ𝖱𝖲subscript𝜓𝖱𝖲\psi_{\mathsf{RS}} is the following

ψ𝖱𝖲(λ)=1401(μ11tμ1+2μ21tμ2+λ4a(1)3a(2)(1tμ1)2)dtλ4a(0),subscript𝜓𝖱𝖲𝜆14superscriptsubscript01subscript𝜇11𝑡subscript𝜇12subscript𝜇21𝑡subscript𝜇2𝜆4𝑎13𝑎2superscript1𝑡subscript𝜇12differential-d𝑡𝜆4𝑎0\displaystyle\psi_{\mathsf{RS}}(\lambda)=\frac{1}{4}\int_{0}^{1}\left(-\frac{\mu_{1}}{1-t\mu_{1}}+\frac{2\mu_{2}}{1-t\mu_{2}}+\lambda\frac{4a(1)-3a(2)}{(1-t\mu_{1})^{2}}\right)\mathrm{d}t-\frac{\lambda}{4}a(0),

after which (13) follows by simple integration. The integrand in the above expression is obtained, as we will show, as the first entry z(0)𝑧0z(0) of the solution 𝒛=[z(0),z(1),z(2)]𝒛superscript𝑧0𝑧1𝑧2top\bm{z}=[z(0),z(1),z(2)]^{\top} of the 3×3333\times 3 linear system

(𝑰t𝑨)𝒛=𝒂,𝑰𝑡𝑨𝒛𝒂(\bm{I}-t\bm{A})\bm{z}=\bm{a},

where 𝒂=[a(0),a(1),a(2)]𝒂superscript𝑎0𝑎1𝑎2top\bm{a}=[a(0),a(1),a(2)]^{\top} and 𝑨𝑨\bm{A} is the “cavity” matrix

𝑨:=λ[a(0)2a(1)a(2)a(1)a(0)a(1)2a(2)2a(1)+3a(2)a(2)4a(1)6a(2)a(0)6a(1)+6a(2)].assign𝑨𝜆matrix𝑎02𝑎1𝑎2𝑎1𝑎0𝑎12𝑎22𝑎13𝑎2𝑎24𝑎16𝑎2𝑎06𝑎16𝑎2\displaystyle\bm{A}:=\lambda\cdot\begin{bmatrix}a(0)&-2a(1)&a(2)\\ a(1)&a(0)-a(1)-2a(2)&-2a(1)+3a(2)\\ a(2)&4a(1)-6a(2)&a(0)-6a(1)+6a(2)\end{bmatrix}.

The above matrix happens to have two eigenvalues which are exactly μ1subscript𝜇1\mu_{1} and μ2subscript𝜇2\mu_{2}. The matrix 𝑨𝑨\bm{A} and the above linear system will emerge naturally as a result of the cavity method. On the other hand, the integral over the time parameter t𝑡t is along an interpolation path invented by Guerra, (2001), (see also Guerra and Toninelli, 2002b, ) in the context of the Sherrington–Kirkpatrick model, and the integrand can be interpreted as the asymptotic variance in a central limit theorem satisfied by the overlap between two “replicas” under the law induced by a certain interpolating Gibbs measure. A definition of these notions with the corresponding results can be found in Sections 3 and 4. The full execution of the cavity method is relegated to Section 5.

2.2 Fluctuations below the reconstruction threshold

Corollary 14 asserts that below the reconstruction threshold, the expectation of the log-likelihood ratio logL(𝒀;λ)𝐿𝒀𝜆\log L(\bm{Y};\lambda) under λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} is of constant order (in N𝑁N) and is asymptotically equal to (log(1λ)λ)/41𝜆𝜆4(-\log\left(1-\lambda\right)-\lambda)/4. In this section we are interested in the fluctuations of this quantity about its expectation. It can be seen by a standard concentration-of-measure argument that for all λ>0𝜆0\lambda>0, logL(𝒀;λ)𝐿𝒀𝜆\log L(\bm{Y};\lambda) concentrates about its expectation with fluctuations bounded by 𝒪(N)𝒪𝑁\mathcal{O}(\sqrt{N}). While this bound is likely to be of the right order above λcsubscript𝜆𝑐\lambda_{c} (this is true for the SK model, see Guerra and Toninelli, 2002a, ), it is very pessimistic below λcsubscript𝜆𝑐\lambda_{c}. Indeed, we will show that the fluctuations are of constant order with a Gaussian limiting law in this regime. This phenomenon was noticed early on in the case of the SK model: Aizenman et al., (1987) showed that in the absence of an external field, the log-partition function of this model has (shifted) Gaussian fluctuations about its easily computed “annealed average” in high temperature. We will directly deduce from their result a central limit theorem for logL(𝒀;λ)𝐿𝒀𝜆\log L(\bm{Y};\lambda) under 0subscript0\operatorname{\mathbb{P}}_{0} in the case where the prior Pxsubscript𝑃xP_{\textup{{x}}} is Rademacher. Furthermore, a proof by Talagrand, 2011b of their result provided us with a road map for proving a similar result under λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda}. We now present our second main result along with consequences for hypothesis testing. For λ<1𝜆1\lambda<1, let

μ(λ)=14(log(1λ)λ),andσ2(λ)=2μ(λ).formulae-sequence𝜇𝜆141𝜆𝜆andsuperscript𝜎2𝜆2𝜇𝜆\mu(\lambda)=\frac{1}{4}\left(-\log(1-\lambda)-\lambda\right),\qquad\mbox{and}\qquad\sigma^{2}(\lambda)={2\mu(\lambda)}.
Theorem 4.

Assume the prior Pxsubscript𝑃xP_{\textup{{x}}} is centered and of unit variance.

  • (i)

    For all λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c}, if 𝒀λsimilar-to𝒀subscript𝜆\bm{Y}\sim\operatorname{\mathbb{P}}_{\lambda} then

    logL(𝒀;λ)𝒩(μ,σ2).𝐿𝒀𝜆𝒩𝜇superscript𝜎2\log L(\bm{Y};\lambda)\rightsquigarrow\mathcal{N}(\mu,\sigma^{2}).
  • (ii)

    Under the additional condition Px=12δ1+12δ+1subscript𝑃x12subscript𝛿112subscript𝛿1P_{\textup{{x}}}=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{+1}, if 𝒀0similar-to𝒀subscript0\bm{Y}\sim\operatorname{\mathbb{P}}_{0}, then for all λ<1𝜆1\lambda<1,

    logL(𝒀;λ)𝒩(μ,σ2).𝐿𝒀𝜆𝒩𝜇superscript𝜎2\log L(\bm{Y};\lambda)\rightsquigarrow\mathcal{N}(-\mu,\sigma^{2}).

The symbol ``"``"``\rightsquigarrow" denotes convergence in distribution as N𝑁N\to\infty. The formal connection to the SK model and the proof of the above theorem are presented in Section 6.

A remarkable feature is the symmetry between the above two statements. Roughly speaking, this symmetry takes its roots in the fact that the model under the alternative distribution λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} “lives on the Nishimori line”: under λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda}, which is the spiked model, the interaction of the spike 𝒙superscript𝒙\bm{x}^{*} with a replica 𝒙(1)superscript𝒙1\bm{x}^{(1)} creates terms that account for twice the contribution of the interaction between two independent replicas 𝒙(1)superscript𝒙1\bm{x}^{(1)} and 𝒙(2)superscript𝒙2\bm{x}^{(2)}, and thus flips the sign of the mean from μ𝜇-\mu to μ𝜇\mu. This mechanism will become apparent from the proof. Moreover, the fact that the mean is half the variance in these limiting Gaussians has interesting consequences for hypothesis testing. The next subsection is devoted to this problem.

We believe that statement (ii) is still valid up to λcsubscript𝜆𝑐\lambda_{c} for a general prior (of zero mean and unit variance). It is possible to prove the convergence up to some value λ0λcsubscript𝜆0subscript𝜆𝑐\lambda_{0}\leq\lambda_{c} with essentially the same approach as ours, but reaching the optimal threshold seems to require more technical work. In particular, our interpolation bound (see our “main estimate”, section 4.2) has to be significantly improved to deal with this case. Progress will be reported in a future work.

We also point out that similar fluctuation results were recently proved by Baik and Lee, (2016, 2017) for a spherical model where one integrates over the uniform measure on the sphere in the definition of L(𝒀;λ)𝐿𝒀𝜆L(\bm{Y};\lambda). Their model, due to its integrable nature, is amenable to analysis using tools from random matrix theory. The authors are thus able to also analyze a “low temperature” regime (absent in our problem) where the fluctuations are no longer Gaussian but given by the Tracy-Widom distribution. Their techniques seem to be tied to the spherical case however.

2.2.1 Strong and weak detection below λcsubscript𝜆𝑐\lambda_{c}.

Consider the problem of deciding whether an array of observations 𝒀={Yij:1i<jN}𝒀conditional-setsubscript𝑌𝑖𝑗1𝑖𝑗𝑁\bm{Y}=\{Y_{ij}:1\leq i<j\leq N\} is likely to have been generated from λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} for a fixed λ>0𝜆0\lambda>0 or from 0subscript0\operatorname{\mathbb{P}}_{0}. Let us denote by 𝑯0:𝒀0:subscript𝑯0similar-to𝒀subscript0\bm{H}_{0}:\bm{Y}\sim\operatorname{\mathbb{P}}_{0} the null hypothesis and 𝑯λ:𝒀λ:subscript𝑯𝜆similar-to𝒀subscript𝜆\bm{H}_{\lambda}:\bm{Y}\sim\operatorname{\mathbb{P}}_{\lambda} the alternative hypothesis. Two formulations of this problem exist: one would like to construct a sequence of measurable tests T:N(N1)/2{0,1}:𝑇maps-tosuperscript𝑁𝑁1201T:\mathbb{R}^{N(N-1)/2}\mapsto\{0,1\} that returns “0” for 𝑯0subscript𝑯0{\bm{H}}_{0} and “1” for 𝑯λsubscript𝑯𝜆{\bm{H}}_{\lambda}, for which either

limNmax{λ(T(𝒀)=0),0(T(𝒀)=1)}=0,subscript𝑁subscript𝜆𝑇𝒀0subscript0𝑇𝒀10\displaystyle\lim_{N\to\infty}~{}\max~{}\Big{\{}\operatorname{\mathbb{P}}_{\lambda}(T(\bm{Y})=0),~{}\operatorname{\mathbb{P}}_{0}(T(\bm{Y})=1)\Big{\}}=0, (15)

or less stringently, the total mis-classification error, or risk

𝖾𝗋𝗋(T):=λ(T(𝒀)=0)+0(T(𝒀)=1)assign𝖾𝗋𝗋𝑇subscript𝜆𝑇𝒀0subscript0𝑇𝒀1\mathsf{err}(T):=\operatorname{\mathbb{P}}_{\lambda}(T(\bm{Y})=0)+\operatorname{\mathbb{P}}_{0}(T(\bm{Y})=1) (16)

is minimized among all possible tests T𝑇T. The question of existence of a test that answers to the requirement (15) is referred to as the strong detection problem, and the question of minimizing the criterion (16) is referred to as the weak detection, or simply hypothesis testing problem.

Strong detection.

Using a second moment argument based on the computation of a truncated version of 𝔼L(𝒀;λ)2𝔼𝐿superscript𝒀𝜆2\operatorname{\mathbb{E}}L(\bm{Y};\lambda)^{2}, Banks et al., (2017) and Perry et al., (2016) showed that λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} and 0subscript0\operatorname{\mathbb{P}}_{0} are mutually contiguous when λ<λ0𝜆subscript𝜆0\lambda<\lambda_{0}, where the latter quantity equals λcsubscript𝜆𝑐\lambda_{c} for some priors Pxsubscript𝑃xP_{\textup{{x}}} while it is suboptimal for others (e.g., the sparse Rademacher case, see discussion below). It is easy to see that contiguity implies impossibility of strong detection since for instance, if 0(T(𝒀)=1)0subscript0𝑇𝒀10\operatorname{\mathbb{P}}_{0}(T(\bm{Y})=1)\to 0 then λ(T(𝒀)=0)1subscript𝜆𝑇𝒀01\operatorname{\mathbb{P}}_{\lambda}(T(\bm{Y})=0)\to 1. Here we show that Theorem 4 provides a more powerful approach to contiguity:

Corollary 5.

Assume the prior Pxsubscript𝑃xP_{\textup{{x}}} is centered and of unit variance. Then for all λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c}, λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} and 0subscript0\operatorname{\mathbb{P}}_{0} are mutually contiguous.

Proof.

A consequence of statement (i) in Theorem 4 is that if

d0dλUdsubscript0dsubscript𝜆𝑈\frac{\mathrm{d}\operatorname{\mathbb{P}}_{0}}{\mathrm{d}\operatorname{\mathbb{P}}_{\lambda}}~{}{\rightsquigarrow}~{}U

under λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} along some subsequence and for some random variable U𝑈U, then by the continuous mapping theorem we necessarily have

U=exp𝒩(μ,σ2).𝑈𝒩𝜇superscript𝜎2U=\exp\mathcal{N}(-\mu,\sigma^{2}).

We have Pr(U>0)=1Pr𝑈01\Pr(U>0)=1, and since μ=12σ2𝜇12superscript𝜎2\mu=\frac{1}{2}\sigma^{2}, we have 𝔼U=1𝔼𝑈1\operatorname{\mathbb{E}}U=1. We now conclude using Le Cam’s first lemma in both directions (Lemma 6.4 or Example 6.5, Van der Vaart, , 2000). \blacksquare

This approach allows one to circumvent second moment computations which are not guaranteed to be tight in general, and necessitate careful and prior-specific conditioning that truncates away undesirable events.

We note that in the case of the sparse Rademacher prior Px=ρ2δ1/ρ+(1ρ)δ0+ρ2δ+1/ρsubscript𝑃x𝜌2subscript𝛿1𝜌1𝜌subscript𝛿0𝜌2subscript𝛿1𝜌P_{\textup{{x}}}=\frac{\rho}{2}\delta_{-1/\sqrt{\rho}}+(1-\rho)\delta_{0}+\frac{\rho}{2}\delta_{+1/\sqrt{\rho}}, contiguity holds for all λ<1𝜆1\lambda<1 as soon as ρρ0.092𝜌superscript𝜌0.092\rho\geq\rho^{*}\approx 0.092 by the above corollary, thus closing the gaps in the results of Banks et al., (2017) and Perry et al., (2016). Indeed, as argued below Lemma 1, the reconstruction and spectral thresholds are equal (λc=1subscript𝜆𝑐1\lambda_{c}=1) for all ρρ𝜌superscript𝜌\rho\geq\rho^{*}, and differ (λc<1subscript𝜆𝑐1\lambda_{c}<1) below ρsuperscript𝜌\rho^{*}. This implies that strong detection is impossible for λ<1𝜆1\lambda<1 and possible otherwise when ρρ𝜌superscript𝜌\rho\geq\rho^{*}, while it becomes impossible only below λcsubscript𝜆𝑐\lambda_{c} but possible otherwise when ρ<ρ𝜌superscript𝜌\rho<\rho^{*}.

Weak detection.

We have seen that strong detection is possible if and only if λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}. It is then natural to ask whether weak detection is possible below λcsubscript𝜆𝑐\lambda_{c}, i.e., is it possible to test with accuracy better than that of a random guess below the reconstruction threshold? The answer is yes, and this is another consequence of Theorem 4. More precisely, the optimal test minimizing the risk (16) is the likelihood ratio test which rejects the null hypothesis 𝑯0subscript𝑯0\bm{H}_{0} (i.e., returns “1”) if L(𝒀;λ)>1𝐿𝒀𝜆1L(\bm{Y};\lambda)>1, and its error is

𝖾𝗋𝗋(λ)=λ(L(𝒀;λ)1)+0(L(𝒀;λ)>1)=1D𝖳𝖵(λ,0).superscript𝖾𝗋𝗋𝜆subscript𝜆𝐿𝒀𝜆1subscript0𝐿𝒀𝜆11subscript𝐷𝖳𝖵subscript𝜆subscript0\mathsf{err}^{*}(\lambda)=\operatorname{\mathbb{P}}_{\lambda}(L(\bm{Y};\lambda)\leq 1)+\operatorname{\mathbb{P}}_{0}(L(\bm{Y};\lambda)>1)=1-D_{\mathsf{TV}}(\operatorname{\mathbb{P}}_{\lambda},\operatorname{\mathbb{P}}_{0}). (17)

One can readily deduce from Theorem 4 the Type-I and Type-II errors of the likelihood ratio test: for all λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} the Type-II error is

λ(logL(𝒀;λ)0)=012πe(tμ)2/2σ2dt+oN(1)=12𝖾𝗋𝖿𝖼(μ2)+oN(1),subscript𝜆𝐿𝒀𝜆0superscriptsubscript012𝜋superscript𝑒superscript𝑡𝜇22superscript𝜎2differential-d𝑡subscript𝑜𝑁112𝖾𝗋𝖿𝖼𝜇2subscript𝑜𝑁1\operatorname{\mathbb{P}}_{\lambda}(\log L(\bm{Y};\lambda)\leq 0)=\int_{-\infty}^{0}\frac{1}{\sqrt{2\pi}}e^{-(t-\mu)^{2}/2\sigma^{2}}\mathrm{d}t+o_{N}(1)=\frac{1}{2}\mathsf{erfc}\left(\frac{\sqrt{\mu}}{2}\right)+o_{N}(1),

and in the case of the Rademacher prior, the Type-I error is

0(logL(𝒀;λ)>0)=0+12πe(t+μ)2/2σ2dt+oN(1)=12𝖾𝗋𝖿𝖼(μ2)+oN(1)subscript0𝐿𝒀𝜆0superscriptsubscript012𝜋superscript𝑒superscript𝑡𝜇22superscript𝜎2differential-d𝑡subscript𝑜𝑁112𝖾𝗋𝖿𝖼𝜇2subscript𝑜𝑁1\operatorname{\mathbb{P}}_{0}(\log L(\bm{Y};\lambda)>0)=\int_{0}^{+\infty}\frac{1}{\sqrt{2\pi}}e^{-(t+\mu)^{2}/2\sigma^{2}}\mathrm{d}t+o_{N}(1)=\frac{1}{2}\mathsf{erfc}\left(\frac{\sqrt{\mu}}{2}\right)+o_{N}(1)

for all λ<1𝜆1\lambda<1. Here, 𝖾𝗋𝖿𝖼(x)=2πxet2dt𝖾𝗋𝖿𝖼𝑥2𝜋superscriptsubscript𝑥superscript𝑒superscript𝑡2differential-d𝑡\mathsf{erfc}(x)=\frac{2}{\sqrt{\pi}}\int_{x}^{\infty}e^{-t^{2}}\mathrm{d}t is the complementary error function. These can be combined into a formula for 𝖾𝗋𝗋(λ)superscript𝖾𝗋𝗋𝜆\mathsf{err}^{*}(\lambda) and the total variation distance between λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda} and 0subscript0\operatorname{\mathbb{P}}_{0} (see plot in Figure 1):

Corollary 6.

Assume Px=12δ1+12δ+1subscript𝑃x12subscript𝛿112subscript𝛿1P_{\textup{{x}}}=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{+1}. For all λ<1𝜆1\lambda<1, we have

limN𝖾𝗋𝗋(λ)=1limND𝖳𝖵(λ,0)=𝖾𝗋𝖿𝖼(μ(λ)2).subscript𝑁superscript𝖾𝗋𝗋𝜆1subscript𝑁subscript𝐷𝖳𝖵subscript𝜆subscript0𝖾𝗋𝖿𝖼𝜇𝜆2\lim_{N\to\infty}\mathsf{err}^{*}(\lambda)=1-\lim_{N\to\infty}D_{\mathsf{TV}}(\operatorname{\mathbb{P}}_{\lambda},\operatorname{\mathbb{P}}_{0})=\mathsf{erfc}\left(\frac{\sqrt{\mu(\lambda)}}{2}\right). (18)

We similarly conjecture that the formula for Type-I error, hence formula (18), should be correct up to λcsubscript𝜆𝑐\lambda_{c} for all (bounded) priors with zero mean and unit variance.

Refer to caption
Refer to caption
Figure 1: Plots of formulas (18) and (14).

3 Overlap convergence: optimal rates

A crucial component of proving our main results is understanding the rate of convergence of the overlap 𝒙𝒙/Nsuperscript𝒙topsuperscript𝒙𝑁\bm{x}^{\top}\bm{x}^{*}/N, where 𝒙𝒙\bm{x} is drawn from λ(|𝒀)\operatorname{\mathbb{P}}_{\lambda}(\cdot|\bm{Y}), to its limit q(λ)superscript𝑞𝜆q^{*}(\lambda). By Bayes’ rule, we see that

dλ(𝒙|𝒀)=eH(𝒙)dPxN(𝒙)eH(𝒙)dPxN(𝒙),dsubscript𝜆conditional𝒙𝒀superscript𝑒𝐻𝒙dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙superscript𝑒𝐻𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙\mathrm{d}\operatorname{\mathbb{P}}_{\lambda}(\bm{x}|\bm{Y})=\frac{e^{-H(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})}{\int e^{-H(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})}, (19)

where H𝐻H is the Hamiltonian

H(𝒙)𝐻𝒙\displaystyle-H(\bm{x}) :=λ2Ni<jxi2xj2+λNi<jYijxixjassignabsent𝜆2𝑁subscript𝑖𝑗superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑗2𝜆𝑁subscript𝑖𝑗subscript𝑌𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗\displaystyle:=-\frac{\lambda}{2N}\sum_{i<j}x_{i}^{2}x_{j}^{2}+\sqrt{\frac{\lambda}{N}}\sum_{i<j}Y_{ij}x_{i}x_{j} (20)
=λ2Ni<jxi2xj2+λNi<jWijxixj+λNi<jxixixjxj.absent𝜆2𝑁subscript𝑖𝑗superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑗2𝜆𝑁subscript𝑖𝑗subscript𝑊𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗𝜆𝑁subscript𝑖𝑗subscript𝑥𝑖superscriptsubscript𝑥𝑖subscript𝑥𝑗superscriptsubscript𝑥𝑗\displaystyle=-\frac{\lambda}{2N}\sum_{i<j}x_{i}^{2}x_{j}^{2}+\sqrt{\frac{\lambda}{N}}\sum_{i<j}W_{ij}x_{i}x_{j}+\frac{\lambda}{N}\sum_{i<j}x_{i}x_{i}^{*}x_{j}x_{j}^{*}.

From the formulas (3) and (4), it is straightforward to see that

FN=1N𝔼logeH(𝒙)dPxN(𝒙),subscript𝐹𝑁1𝑁𝔼superscript𝑒𝐻𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙F_{N}=\frac{1}{N}\operatorname{\mathbb{E}}\log~{}\int e^{-H(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}),

This provides another way of interpreting FNsubscript𝐹𝑁F_{N} as the expected log-partition function (or normalizing constant) of the posterior λ(|𝒀)\operatorname{\mathbb{P}}_{\lambda}(\cdot|\bm{Y}). For an integer n1𝑛1n\geq 1 and f:(N)n+1:𝑓maps-tosuperscriptsuperscript𝑁𝑛1f:(\mathbb{R}^{N})^{n+1}\mapsto\mathbb{R}, we define the Gibbs average of f𝑓f w.r.t. H𝐻H as

f(𝒙(1),,𝒙(n),𝒙):=f(𝒙(1),,𝒙(n),𝒙)l=1neH(𝒙(l))dPxN(𝒙(l))l=1neH(𝒙(l))dPxN(𝒙(l)).assigndelimited-⟨⟩𝑓superscript𝒙1superscript𝒙𝑛superscript𝒙𝑓superscript𝒙1superscript𝒙𝑛superscript𝒙superscriptsubscriptproduct𝑙1𝑛superscript𝑒𝐻superscript𝒙𝑙dsuperscriptsubscript𝑃xtensor-productabsent𝑁superscript𝒙𝑙superscriptsubscriptproduct𝑙1𝑛superscript𝑒𝐻superscript𝒙𝑙dsuperscriptsubscript𝑃xtensor-productabsent𝑁superscript𝒙𝑙\left\langle f(\bm{x}^{(1)},\cdots,\bm{x}^{(n)},\bm{x}^{*})\right\rangle:=\frac{\int f(\bm{x}^{(1)},\cdots,\bm{x}^{(n)},\bm{x}^{*})\prod_{l=1}^{n}e^{-H(\bm{x}^{(l)})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}^{(l)})}{\int\prod_{l=1}^{n}e^{-H(\bm{x}^{(l)})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}^{(l)})}. (21)

This is nothing else that the average of f𝑓f with respect to λ(|𝒀)n\operatorname{\mathbb{P}}_{\lambda}(\cdot|\bm{Y})^{\otimes n}. The variables 𝒙(l),l=1,nformulae-sequencesuperscript𝒙𝑙𝑙1𝑛\bm{x}^{(l)},l=1\cdots,n are called replicas, and are interpreted as random variables independently drawn from the posterior. When n=1𝑛1n=1 we simply write f(𝒙,𝒙)𝑓𝒙superscript𝒙f(\bm{x},\bm{x}^{*}) instead of f(𝒙(1),𝒙)𝑓superscript𝒙1superscript𝒙f(\bm{x}^{(1)},\bm{x}^{*}). Throughout the rest of the manuscript, we use the following notation: for l,l=1,,n,formulae-sequence𝑙superscript𝑙1𝑛l,l^{\prime}=1,\cdots,n,*, we let

Rl,l:=𝒙(l)𝒙(l)=1Ni=1Nxi(l)xi(l).assignsubscript𝑅𝑙superscript𝑙superscript𝒙𝑙superscript𝒙superscript𝑙1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝑥𝑖𝑙superscriptsubscript𝑥𝑖superscript𝑙R_{l,l^{\prime}}:=\bm{x}^{(l)}\cdot\bm{x}^{(l^{\prime})}=\frac{1}{N}\sum_{i=1}^{N}x_{i}^{(l)}x_{i}^{(l^{\prime})}.

In this section we show the convergence of the first 4 moments of the overlap at optimal rates under some conditions: if either the prior Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric about the origin or the Hamiltonian H𝐻H is “perturbed” in the following sense. Let t[0,1]𝑡01t\in[0,1] and consider the“ interpolating” Hamiltonian (this qualification will become clear in the next section)

Ht(𝒙)subscript𝐻𝑡𝒙\displaystyle-H_{t}(\bm{x}) :=tλ2Ni<jxi2xj2+tλNi<jWijxixj+tλNi<jxixixjxjassignabsent𝑡𝜆2𝑁subscript𝑖𝑗superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑗2𝑡𝜆𝑁subscript𝑖𝑗subscript𝑊𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗𝑡𝜆𝑁subscript𝑖𝑗subscript𝑥𝑖superscriptsubscript𝑥𝑖subscript𝑥𝑗superscriptsubscript𝑥𝑗\displaystyle:=-\frac{t\lambda}{2N}\sum_{i<j}x_{i}^{2}x_{j}^{2}+\sqrt{\frac{t\lambda}{N}}\sum_{i<j}W_{ij}x_{i}x_{j}+\frac{t\lambda}{N}\sum_{i<j}x_{i}x_{i}^{*}x_{j}x_{j}^{*} (22)
(1t)r2i=1Nxi2+(1t)ri=1Nzixi+(1t)ri=1Nxixi,1𝑡𝑟2superscriptsubscript𝑖1𝑁superscriptsubscript𝑥𝑖21𝑡𝑟superscriptsubscript𝑖1𝑁subscript𝑧𝑖subscript𝑥𝑖1𝑡𝑟superscriptsubscript𝑖1𝑁subscript𝑥𝑖superscriptsubscript𝑥𝑖\displaystyle~{}~{}~{}-\frac{(1-t)r}{2}\sum_{i=1}^{N}x_{i}^{2}+\sqrt{(1-t)r}\sum_{i=1}^{N}z_{i}x_{i}+(1-t)r\sum_{i=1}^{N}x_{i}x_{i}^{*},

where the zisubscript𝑧𝑖z_{i}’s are i.i.d. standard Gaussian r.v.’s independent of everything else, and r=λq(λ)𝑟𝜆superscript𝑞𝜆r=\lambda q^{*}(\lambda). We similarly define the Gibbs average tsubscriptdelimited-⟨⟩𝑡\langle\cdot\rangle_{t} as in (21) where H𝐻H is replaced by Htsubscript𝐻𝑡H_{t}. We now state a fundamental property satisfied by both delimited-⟨⟩\langle\cdot\rangle and tsubscriptdelimited-⟨⟩𝑡\langle\cdot\rangle_{t}.

The Nishimori property.

The fact that the Gibbs measure delimited-⟨⟩\langle\cdot\rangle is a posterior distribution (19) has far-reaching consequences. A crucial implication is that the n+1𝑛1n+1-tuples (𝒙(1),,𝒙(n+1))superscript𝒙1superscript𝒙𝑛1(\bm{x}^{(1)},\cdots,\bm{x}^{(n+1)}) and (𝒙(1),,𝒙(n),𝒙)superscript𝒙1superscript𝒙𝑛superscript𝒙(\bm{x}^{(1)},\cdots,\bm{x}^{(n)},\bm{x}^{*}) have the same law under 𝔼𝔼\operatorname{\mathbb{E}}\langle\cdot\rangle. This fact, which is a simple consequence of Bayes’ rule (see Proposition 16, Lelarge and Miolane, , 2016) prevents replica-symmetry from breaking (see Korada and Macris, , 2009). In particular, R1,2subscript𝑅12R_{1,2} and R1,subscript𝑅1R_{1,*} have the same distribution. This bares the name of the Nishimori property in the spin glass literature (Nishimori, , 2001). Moreover, this property is preserved under the interpolating Gibbs measure tsubscriptdelimited-⟨⟩𝑡\langle\cdot\rangle_{t} for all t[0,1]𝑡01t\in[0,1]. Indeed, the interpolation is constructed in such a way that tsubscriptdelimited-⟨⟩𝑡\langle\cdot\rangle_{t} is the posterior distribution of the signal 𝒙superscript𝒙\bm{x}^{*} given the augmented set of observations

{Yij=tλNxixj+Wij,1i<jN,yi=(1t)rxi+zi,1iN,casessubscript𝑌𝑖𝑗formulae-sequenceabsent𝑡𝜆𝑁subscriptsuperscript𝑥𝑖subscriptsuperscript𝑥𝑗subscript𝑊𝑖𝑗1𝑖𝑗𝑁subscript𝑦𝑖formulae-sequenceabsent1𝑡𝑟subscriptsuperscript𝑥𝑖subscript𝑧𝑖1𝑖𝑁\displaystyle\begin{cases}Y_{ij}&=\sqrt{\frac{t\lambda}{N}}x^{*}_{i}x^{*}_{j}+W_{ij},\quad 1\leq i<j\leq N,\\ y_{i}&=\sqrt{(1-t)r}x^{*}_{i}+z_{i},\quad 1\leq i\leq N,\end{cases} (23)

where one receives side information about 𝒙superscript𝒙\bm{x}^{*} from a scalar Gaussian channel, r=λq(λ)𝑟𝜆superscript𝑞𝜆r=\lambda q^{*}(\lambda), and the signal-to-noise ratios of the two channels are altered in a time dependent way. Now we state our concentration result.

Theorem 7.

For all λ𝒜𝜆𝒜\lambda\in\mathcal{A} and all t[0,1]𝑡01t\in[0,1], there exist constants K(λ)0𝐾𝜆0K(\lambda)\geq 0 and c(t)0𝑐𝑡0c(t)\geq 0 such that

𝔼(R1,q)4tK(λ)(1N2+ec(t)N).\operatorname{\mathbb{E}}\left\langle\left(R_{1,*}-q^{*}\right)^{4}\right\rangle_{t}\leq K(\lambda)\Big{(}\frac{1}{N^{2}}+e^{-c(t)N}\Big{)}. (24)

Moreover, c(t)>0𝑐𝑡0c(t)>0 on [0,1)01[0,1), and if either λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} or Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric about the origin, then c(t)c0𝑐𝑡subscript𝑐0c(t)\geq c_{0} for some constant c0=c0(λ)>0subscript𝑐0subscript𝑐0𝜆0c_{0}=c_{0}(\lambda)>0. Otherwise, c(t)c0(1t)2similar-to𝑐𝑡subscript𝑐0superscript1𝑡2c(t)\sim c_{0}(1-t)^{2} as t1𝑡1t\to 1.

If Pxsubscript𝑃xP_{\textup{{x}}} is symmetric about the origin then the distribution of R1,subscript𝑅1R_{1,*} under 𝔼𝔼\operatorname{\mathbb{E}}\langle\cdot\rangle is also symmetric, so 𝔼R1,=0𝔼subscript𝑅10\operatorname{\mathbb{E}}\langle R_{1,*}\rangle=0. If moreover q(λ)>0superscript𝑞𝜆0q^{*}(\lambda)>0 (i.e., λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}) then (24) becomes trivial at t=1𝑡1t=1 since both sides are constant. On the other hand, if either t<1𝑡1t<1 or Pxsubscript𝑃xP_{\textup{{x}}} is asymmetric, the sign symmetry of the spike is broken. This forces the overlap to be positive and hence concentrate about q(λ)superscript𝑞𝜆q^{*}(\lambda). Finally, if λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c}, q(λ)=0superscript𝑞𝜆0q^{*}(\lambda)=0 and the sign symmetry becomes irrelevant since the overlap converges to zero regardless. Let us mention that in the symmetric unperturbed case (t=1𝑡1t=1), we expect a variant of (24) to hold where R1,subscript𝑅1R_{1,*} is replaced by its absolute value in the statement, and the upper bound would be K/N2𝐾superscript𝑁2K/N^{2}. Unfortunately, our methods do not allow us to prove such a statement, but we are able to prove a weaker result (see Lemma 13): for all ϵ0italic-ϵ0\epsilon\geq 0,

𝔼𝟙{||R1,|q|ϵ}0.𝔼1subscript𝑅1superscript𝑞italic-ϵ0\operatorname{\mathbb{E}}\big{\langle}\mathds{1}\left\{\big{|}|R_{1,*}|-q^{*}\big{|}\geq\epsilon\right\}\big{\rangle}\longrightarrow 0. (25)

Although this a minor technical point, we also point out that the estimate c(t)c0(1t)2similar-to𝑐𝑡subscript𝑐0superscript1𝑡2c(t)\sim c_{0}(1-t)^{2} in the statement is suboptimal. A heuristic argument allows us to get c(t)c0(1t)similar-to𝑐𝑡subscript𝑐01𝑡c(t)\sim c_{0}(1-t) as t1𝑡1t\to 1, but we are currently unable to rigorously justify it.

MMSE.

The bound (24) can be used to deduce the optimal error of estimating 𝒙superscript𝒙\bm{x}^{*} based on the observations (23). The posterior mean 𝒙tsubscriptdelimited-⟨⟩𝒙𝑡\langle\bm{x}\rangle_{t} is the estimator with Minimal Mean Squared Error (MMSE) among all estimators θ^(𝒀,𝒚)N^𝜃𝒀𝒚superscript𝑁\widehat{\theta}(\bm{Y},\bm{y})\in\mathbb{R}^{N}, and the MMSE is

1Ni=1N𝔼[(xixit)2]1𝑁superscriptsubscript𝑖1𝑁𝔼superscriptsubscriptsuperscript𝑥𝑖subscriptdelimited-⟨⟩subscript𝑥𝑖𝑡2\displaystyle\frac{1}{N}\sum_{i=1}^{N}\operatorname{\mathbb{E}}\left[(x^{*}_{i}-\langle x_{i}\rangle_{t})^{2}\right] =𝔼Px[X2]2Ni=1N𝔼xixit+1Ni=1N𝔼xit2\displaystyle=\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}]-\frac{2}{N}\sum_{i=1}^{N}\operatorname{\mathbb{E}}\langle x_{i}x_{i}^{*}\rangle_{t}+\frac{1}{N}\sum_{i=1}^{N}\operatorname{\mathbb{E}}\langle x_{i}\rangle_{t}^{2}
=𝔼Px[X2]𝔼R1,t.\displaystyle=\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}]-\operatorname{\mathbb{E}}\langle R_{1,*}\rangle_{t}.

The last line follows from the Nishimori property, since 𝔼xt2=𝔼x(1)x(2)t=𝔼xxt\operatorname{\mathbb{E}}\langle x\rangle_{t}^{2}=\operatorname{\mathbb{E}}\langle x^{(1)}x^{(2)}\rangle_{t}=\operatorname{\mathbb{E}}\langle xx^{*}\rangle_{t}. Theorem 7 implies in particular (under the conditions of its validity) that 𝔼R1,tq(λ)\operatorname{\mathbb{E}}\langle R_{1,*}\rangle_{t}\to q^{*}(\lambda), yielding the value of the MMSE. It is in particular possible to estimate the spike 𝒙superscript𝒙\bm{x}^{*} from the observations (23) with non-trivial accuracy if and only if λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}. Note that at t=1𝑡1t=1 (no side information) the result still holds below λcsubscript𝜆𝑐\lambda_{c} or when the prior is not symmetric. Otherwise, as mentioned before, the problem is invariant under a sign flip of 𝒙superscript𝒙\bm{x}^{*} so one has to change the measure of performance. Beside the result (25), we are unable to say much in this situation.

Asymptotic variance.

By Jensen’s inequality we deduce from (24) the convergence of the second moment:

𝔼(R1,q)2tK(λ)(1N+ec(t)N).\operatorname{\mathbb{E}}\left\langle\left(R_{1,*}-q^{*}\right)^{2}\right\rangle_{t}\leq K(\lambda)\Big{(}\frac{1}{N}+e^{-c(t)N}\Big{)}. (26)

To establish our finite-size correction result (Theorem 2) we need to prove a result stronger than (26), namely that N𝔼(R1,q)2tN\cdot\operatorname{\mathbb{E}}\left\langle\left(R_{1,*}-q^{*}\right)^{2}\right\rangle_{t} converges to a limit. For t[0,1]𝑡01t\in[0,1] and λ𝒜𝜆𝒜\lambda\in\mathcal{A}, we let

Δ𝖱𝖲(λ;t):=1λ(μ11tμ1+2μ21tμ2+λ4a(1)3a(2)(1tμ1)2),assignsubscriptΔ𝖱𝖲𝜆𝑡1𝜆subscript𝜇11𝑡subscript𝜇12subscript𝜇21𝑡subscript𝜇2𝜆4𝑎13𝑎2superscript1𝑡subscript𝜇12\Delta_{\mathsf{RS}}(\lambda;t):=\frac{1}{\lambda}\left(-\frac{\mu_{1}}{1-t\mu_{1}}+\frac{2\mu_{2}}{1-t\mu_{2}}+\lambda\frac{4a(1)-3a(2)}{(1-t\mu_{1})^{2}}\right), (27)

where μ1subscript𝜇1\mu_{1} and μ2subscript𝜇2\mu_{2} are defined in (12).

Theorem 8.

For all λ𝒜𝜆𝒜\lambda\in\mathcal{A} and all t[0,1]𝑡01t\in[0,1], there exist constants K(λ)0𝐾𝜆0K(\lambda)\geq 0 and c(t)0𝑐𝑡0c(t)\geq 0 such that

|N𝔼(R1,q)2tΔ𝖱𝖲(λ;t)|K(λ)(1N+Nec(t)N).\left|N\cdot\operatorname{\mathbb{E}}\left\langle\left(R_{1,*}-q^{*}\right)^{2}\right\rangle_{t}-\Delta_{\mathsf{RS}}(\lambda;t)\right|\leq K(\lambda)\left(\frac{1}{\sqrt{N}}+Ne^{-c(t)N}\right).

Moreover, c(t)>0𝑐𝑡0c(t)>0 on [0,1)01[0,1), and if either λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} or Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric about the origin, then c(t)c0𝑐𝑡subscript𝑐0c(t)\geq c_{0} for some constant c0=c0(λ)>0subscript𝑐0subscript𝑐0𝜆0c_{0}=c_{0}(\lambda)>0. Otherwise, c(t)c0(1t)2similar-to𝑐𝑡subscript𝑐0superscript1𝑡2c(t)\sim c_{0}(1-t)^{2} as t1𝑡1t\to 1.

The proofs of Theorems 7 and 8 rely on the cavity method, and will be presented in Section 5. Finally, the techniques we use could be easily extended to prove convergence of all the moments at optimal rates: for all integers k𝑘k,

𝔼(R1,q)2ktK(k)Nk+K(k)ec(k,t)N,\operatorname{\mathbb{E}}\left\langle\left(R_{1,*}-q^{*}\right)^{2k}\right\rangle_{t}\leq\frac{K(k)}{N^{k}}+K(k)e^{-c(k,t)N},

but we will not need this stronger statement.

4 The interpolation method

In this section we present the interpolation method of Guerra, (2001). All our main arguments will rely, in one way or another, on this method. Along the way, we prove Theorem 2. The idea is to construct a continuous interpolation path between the Hamiltonian H𝐻H and a simpler Hamiltonian that decouples all the variables, and analyze the incremental change in the free energy along the path. We present two versions of this method. The first one is the classical method which is applied to the free energy of the entire system, and a second one applied to the free energy of a more restricted system.

4.1 The Guerra interpolation

Our interpolating Hamiltonian is Htsubscript𝐻𝑡H_{t} from (22) with r=λq𝑟𝜆𝑞r=\lambda q for some q0𝑞0q\geq 0. Now we consider the interpolating free energy

φ(t):=1N𝔼logeHt(𝒙)dPxN(𝒙).assign𝜑𝑡1𝑁𝔼superscript𝑒subscript𝐻𝑡𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙\varphi(t):=\frac{1}{N}\operatorname{\mathbb{E}}\log\int e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}). (28)

We see that φ(1)=FN𝜑1subscript𝐹𝑁\varphi(1)=F_{N} and φ(0)=ψ(λq)𝜑0𝜓𝜆𝑞\varphi(0)=\psi(\lambda q). This function is moreover differentiable in t𝑡t, and by differentiation, we have

φ(t)superscript𝜑𝑡\displaystyle\varphi^{\prime}(t) =1N𝔼dHt(𝒙)dtt\displaystyle=\frac{1}{N}\operatorname{\mathbb{E}}\left\langle-\frac{\mathrm{d}H_{t}(\bm{x})}{\mathrm{d}t}\right\rangle_{t}
=1N𝔼λ2Ni<jxi2xj2+12λtNi<jWijxixj+λNi<jxixixjxjt\displaystyle=\frac{1}{N}\operatorname{\mathbb{E}}\left\langle-\frac{\lambda}{2N}\sum_{i<j}x_{i}^{2}x_{j}^{2}+\frac{1}{2}\sqrt{\frac{\lambda}{tN}}\sum_{i<j}W_{ij}x_{i}x_{j}+\frac{\lambda}{N}\sum_{i<j}x_{i}x_{i}^{*}x_{j}x_{j}^{*}\right\rangle_{t}
+1N𝔼λq2i=1Nxi212λq1ti=1Nzixiλqi=1Nxixit.\displaystyle~{}~{}~{}+\frac{1}{N}\operatorname{\mathbb{E}}\left\langle\frac{\lambda q}{2}\sum_{i=1}^{N}x_{i}^{2}-\frac{1}{2}\sqrt{\frac{\lambda q}{1-t}}\sum_{i=1}^{N}z_{i}x_{i}-\lambda q\sum_{i=1}^{N}x_{i}x_{i}^{*}\right\rangle_{t}.

Now we use Gaussian integration by parts to eliminate the variables Wijsubscript𝑊𝑖𝑗W_{ij} and zisubscript𝑧𝑖z_{i}. The details of this computation are explained extensively in many sources. See (Talagrand, 2011a, ; Krzakala et al., , 2016; Lelarge and Miolane, , 2016). We get

φ(t)superscript𝜑𝑡\displaystyle\varphi^{\prime}(t) =λ2N2𝔼i<jxi(1)xj(1)xi(2)xj(2)t+λN2𝔼i<jxixixjxjt\displaystyle=-\frac{\lambda}{2N^{2}}\operatorname{\mathbb{E}}\left\langle\sum_{i<j}x_{i}^{(1)}x_{j}^{(1)}x_{i}^{(2)}x_{j}^{(2)}\right\rangle_{t}+\frac{\lambda}{N^{2}}\operatorname{\mathbb{E}}\left\langle\sum_{i<j}x_{i}x_{i}^{*}x_{j}x_{j}^{*}\right\rangle_{t}
+λq2N𝔼i=1Nxi(1)xi(2)tλqN𝔼i=1Nxixit.\displaystyle~{}~{}~{}+\frac{\lambda q}{2N}\operatorname{\mathbb{E}}\left\langle\sum_{i=1}^{N}x_{i}^{(1)}x_{i}^{(2)}\right\rangle_{t}-\frac{\lambda q}{N}\operatorname{\mathbb{E}}\left\langle\sum_{i=1}^{N}x_{i}x_{i}^{*}\right\rangle_{t}.

Completing the squares yields

φ(t)superscript𝜑𝑡\displaystyle\varphi^{\prime}(t) =λ4𝔼(𝒙(1)𝒙(2)q)2t+λ4q2+λ4N2i=1N𝔼xi(1)2xi(2)2t\displaystyle=-\frac{\lambda}{4}\operatorname{\mathbb{E}}\left\langle(\bm{x}^{(1)}\cdot\bm{x}^{(2)}-q)^{2}\right\rangle_{t}+\frac{\lambda}{4}q^{2}+\frac{\lambda}{4N^{2}}\sum_{i=1}^{N}\operatorname{\mathbb{E}}\left\langle{x_{i}^{(1)}}^{2}{x_{i}^{(2)}}^{2}\right\rangle_{t} (29)
+λ2𝔼(𝒙𝒙q)2tλ2q2λ2N2i=1N𝔼xi2xi2t.\displaystyle~{}~{}~{}+\frac{\lambda}{2}\operatorname{\mathbb{E}}\left\langle(\bm{x}\cdot\bm{x}^{*}-q)^{2}\right\rangle_{t}-\frac{\lambda}{2}q^{2}-\frac{\lambda}{2N^{2}}\sum_{i=1}^{N}\operatorname{\mathbb{E}}\left\langle{x_{i}}^{2}{x_{i}^{*}}^{2}\right\rangle_{t}.

The first line in the above expression involves overlaps between two independent replicas, while the second one involves overlaps between one replica and the planted solution. Using the Nishimori property, the derivative of φ𝜑\varphi can be written as

φ(t)=λ4𝔼(R1,q)2tλ4q2λ4N𝔼xN2xN2t.\varphi^{\prime}(t)=\frac{\lambda}{4}\operatorname{\mathbb{E}}\left\langle(R_{1,*}-q)^{2}\right\rangle_{t}-\frac{\lambda}{4}q^{2}-\frac{\lambda}{4N}\operatorname{\mathbb{E}}\left\langle{x_{N}}^{2}{x_{N}^{*}}^{2}\right\rangle_{t}. (30)

The last term follows by symmetry between sites. Now, integrating over t𝑡t, the difference between the free energy and the 𝖱𝖲𝖱𝖲\mathsf{RS} potential F(λ,q)𝐹𝜆𝑞F(\lambda,q) can be written in the form of a sum rule:

FNF(λ,q)=λ401(𝔼(R1,q)2t1N𝔼xN2xN2t)dt.F_{N}-F(\lambda,q)=\frac{\lambda}{4}\int_{0}^{1}\Big{(}\operatorname{\mathbb{E}}\left\langle(R_{1,*}-q)^{2}\right\rangle_{t}-\frac{1}{N}\operatorname{\mathbb{E}}\left\langle{x_{N}}^{2}{x_{N}^{*}}^{2}\right\rangle_{t}\Big{)}\mathrm{d}t. (31)

We see from (31) that FNsubscript𝐹𝑁F_{N} converges to F(λ,q)𝐹𝜆𝑞F(\lambda,q) if and only if the overlap R1,subscript𝑅1R_{1,*} concentrates about q𝑞q. This happens only for a value of q𝑞q that maximizes the 𝖱𝖲𝖱𝖲\mathsf{RS} potential F(λ,)𝐹𝜆F(\lambda,\cdot). Using Theorem 7 one can already prove the 1/N1𝑁1/N optimal rate below λcsubscript𝜆𝑐\lambda_{c} or above it when the prior is not symmetric. Indeed since c(t)𝑐𝑡c(t) is lower-bounded by a positive constant in this case, the bound (26) yields 01𝔼(R1,q)2tdtK(λ)/N\int_{0}^{1}\operatorname{\mathbb{E}}\langle(R_{1,*}-q^{*})^{2}\rangle_{t}\mathrm{d}t\leq K(\lambda)/N. Also, the second integrand in (31) is bounded by K/N𝐾𝑁K/N for some constant K0𝐾0K\geq 0, so we have for all λ𝒜𝜆𝒜\lambda\in\mathcal{A}, FN=ϕ𝖱𝖲(λ)+𝒪(1/N)subscript𝐹𝑁subscriptitalic-ϕ𝖱𝖲𝜆𝒪1𝑁F_{N}=\phi_{\mathsf{RS}}(\lambda)+\mathcal{O}(1/N). If λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c} and the prior is symmetric then we are only able to prove a rate of 1/N1𝑁1/\sqrt{N} due to the fact c(t)c0(1t)2similar-to𝑐𝑡subscript𝑐0superscript1𝑡2c(t)\sim c_{0}(1-t)^{2} as t1𝑡1t\to 1. The 1/N1𝑁1/N rate would follow immediately in this case if one is able to improve the latter estimate to c(t)c0(1t)similar-to𝑐𝑡subscript𝑐01𝑡c(t)\sim c_{0}(1-t). To go further, we use Theorem 8, and the additional fact that 𝔼xN2xN2t\operatorname{\mathbb{E}}\langle{x_{N}}^{2}{x_{N}^{*}}^{2}\rangle_{t} has a limit:

Lemma 9.

For all λ𝒜𝜆𝒜\lambda\in\mathcal{A} and for all t[0,1)𝑡01t\in[0,1), there exist constants K(λ)0𝐾𝜆0K(\lambda)\geq 0 and c(t)0𝑐𝑡0c(t)\geq 0 such that

|𝔼xN2xN2ta(0)|K(λ)(1N+ec(t)N).\left|\operatorname{\mathbb{E}}\left\langle{x_{N}}^{2}{x_{N}^{*}}^{2}\right\rangle_{t}-a(0)\right|\leq K(\lambda)\left(\frac{1}{\sqrt{N}}+e^{-c(t)N}\right).

Moreover, c(t)>0𝑐𝑡0c(t)>0 on [0,1)01[0,1), and if either λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} or Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric about the origin, then c(t)c0𝑐𝑡subscript𝑐0c(t)\geq c_{0} for some constant c0=c0(λ)>0subscript𝑐0subscript𝑐0𝜆0c_{0}=c_{0}(\lambda)>0. Otherwise, c(t)c0(1t)2similar-to𝑐𝑡subscript𝑐0superscript1𝑡2c(t)\sim c_{0}(1-t)^{2} as t1𝑡1t\to 1.

The proof of Lemma 9 relies on the cavity method, and will be presented in the Section 5. Now we are ready to prove Theorem 2.

Proof of Theorem 2. By formula (31) with the choice q=q(λ)𝑞superscript𝑞𝜆q=q^{*}(\lambda), we have

|N(FNϕ𝖱𝖲(λ))λ4(01Δ𝖱𝖲(λ;t)dta(0))|𝑁subscript𝐹𝑁subscriptitalic-ϕ𝖱𝖲𝜆𝜆4superscriptsubscript01subscriptΔ𝖱𝖲𝜆𝑡differential-d𝑡𝑎0\displaystyle\left|N(F_{N}-\phi_{\mathsf{RS}}(\lambda))-\frac{\lambda}{4}\Big{(}\int_{0}^{1}\Delta_{\mathsf{RS}}(\lambda;t)\mathrm{d}t-a(0)\Big{)}\right| λ401|N𝔼(R1,q)2tΔ𝖱𝖲(λ;t)|dt\displaystyle\leq\frac{\lambda}{4}\int_{0}^{1}\left|N\operatorname{\mathbb{E}}\left\langle\left(R_{1,*}-q^{*}\right)^{2}\right\rangle_{t}-\Delta_{\mathsf{RS}}(\lambda;t)\right|\mathrm{d}t
+λ401|𝔼xN2xN2ta(0)|dt.\displaystyle~{}+\frac{\lambda}{4}\int_{0}^{1}\left|\operatorname{\mathbb{E}}\left\langle{x_{N}}^{2}{x_{N}^{*}}^{2}\right\rangle_{t}-a(0)\right|\mathrm{d}t.

By Theorem 8 and Lemma 9, the integrands on the right-hand side are bounded by K/N+KNec(t)N𝐾𝑁𝐾𝑁superscript𝑒𝑐𝑡𝑁K/\sqrt{N}+KNe^{-c(t)N} where c(t)>c0>0𝑐𝑡subscript𝑐00c(t)>c_{0}>0 for all t𝑡t in the cases λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} or Pxsubscript𝑃xP_{\textup{{x}}} not symmetric about the origin, so the convergence follows. The function ψ𝖱𝖲(λ)subscript𝜓𝖱𝖲𝜆\psi_{\mathsf{RS}}(\lambda) is the second term in the left-hand side. Formula (13) follows by integration.  

4.2 The main estimate: energy gap at suboptimal overlap

Recall the interpolating Hamiltonian Htsubscript𝐻𝑡H_{t} from (22) with r=λq(λ)𝑟𝜆superscript𝑞𝜆r=\lambda q^{*}(\lambda). Let us now introduce the Franz-Parisi potential (Franz and Parisi, , 1995). For m𝑚m\in\mathbb{R} and ϵ>0italic-ϵ0\epsilon>0 we define

Φϵ(m;t):=1N𝔼log𝟙{R1,[m,m+ϵ)}eHt(𝒙)dPxN(𝒙).assignsubscriptΦitalic-ϵ𝑚𝑡1𝑁𝔼1subscript𝑅1𝑚𝑚italic-ϵsuperscript𝑒subscript𝐻𝑡𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙\Phi_{\epsilon}(m;t):=\frac{1}{N}\operatorname{\mathbb{E}}\log\int\mathds{1}\{R_{1,*}\in[m,m+\epsilon)\}e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}). (32)

This is the free energy of a subsystem of configurations having an overlap close to a fixed value m𝑚m with the planted signal 𝒙superscript𝒙\bm{x}^{*}. It is clear that Φϵ(m;t)φ(t)subscriptΦitalic-ϵ𝑚𝑡𝜑𝑡\Phi_{\epsilon}(m;t)\leq\varphi(t), where the latter is the interpolating free energy defined in (28). The purpose of this section is to prove that when m𝑚m is far from q(λ)superscript𝑞𝜆q^{*}(\lambda) then there is a sizable gap between Φϵ(m;t)subscriptΦitalic-ϵ𝑚𝑡\Phi_{\epsilon}(m;t) and φ(t)𝜑𝑡\varphi(t). This estimate is a main ingredient is our proof of overlap concentration. (The other main ingredient is the cavity method, which will be presented in the next section.) To prove this we will need the auxiliary function

ϕ𝖱𝖲(λ;t)=supq0ψ(λq)tλq24.subscriptitalic-ϕ𝖱𝖲𝜆𝑡subscriptsupremum𝑞0𝜓𝜆𝑞𝑡𝜆superscript𝑞24\phi_{\mathsf{RS}}(\lambda;t)=\sup_{q\geq 0}~{}\psi(\lambda q)-\frac{t\lambda q^{2}}{4}.

One can show that the above formula is the limit of φ(t)𝜑𝑡\varphi(t) as N𝑁N\to\infty, for example, by using the so called “Aizenman-Sims-Starr scheme” (Aizenman et al., , 2003); see (Lelarge and Miolane, , 2016). For our purposes we will only need the inequality

φ(t)ϕ𝖱𝖲(λ;t)KλtN,𝜑𝑡subscriptitalic-ϕ𝖱𝖲𝜆𝑡𝐾𝜆𝑡𝑁\varphi(t)\geq\phi_{\mathsf{RS}}(\lambda;t)-\frac{K\lambda t}{N}, (33)

which can be proved using the interpolation method presented in the previous section and dropping the non-negative term 𝔼(R1,q)2𝔼superscriptsubscript𝑅1superscript𝑞2\operatorname{\mathbb{E}}\langle(R_{1,*}-q^{*})^{2}\rangle from the expression analogous to (30) in this case. Now it suffices to compare Φϵ(m;t)subscriptΦitalic-ϵ𝑚𝑡\Phi_{\epsilon}(m;t) to ϕ𝖱𝖲(λ;t)subscriptitalic-ϕ𝖱𝖲𝜆𝑡\phi_{\mathsf{RS}}(\lambda;t). The result is given in Proposition 12, and we finish this subsection by Proposition 13 showing convergence in probability of the overlaps as a straightforward consequence.

For r0𝑟0r\geq 0 and s𝑠s\in\mathbb{R}, we let

  ψ(r,s):=𝔼x,zlogexp(rzx+sxxr2x2)dPx(x).assign  𝜓𝑟𝑠subscript𝔼superscript𝑥𝑧𝑟𝑧𝑥𝑠𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝑃x𝑥\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,s):=\operatorname{\mathbb{E}}_{x^{*},z}\log\int\exp\left(\sqrt{r}zx+sxx^{*}-\frac{r}{2}x^{2}\right)\mathrm{d}P_{\textup{{x}}}(x). (34)

We see that   ψ(r,r)=ψ(r)  𝜓𝑟𝑟𝜓𝑟\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)=\psi(r), but unlike ψ𝜓\psi, the function   ψ  𝜓\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi} does not have an interpretation as the 𝖪𝖫𝖪𝖫\mathsf{KL} between two distributions. The next lemma states some key properties of this function that will be useful later on.

Lemma 10.

For all r0𝑟0r\geq 0, it holds that

  • The function s  ψ(r,s)maps-to𝑠  𝜓𝑟𝑠s\mapsto\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,s) is strictly convex, hence strongly convex on any compact.

  • There exist a constant c=c(r,Px)0𝑐𝑐𝑟subscript𝑃x0c=c(r,P_{\textup{{x}}})\geq 0 such that   ψ(r,r)  ψ(r,r)c  𝜓𝑟𝑟  𝜓𝑟𝑟𝑐\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r)\leq\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)-c. If r>0𝑟0r>0 then c>0𝑐0c>0 unless the prior Pxsubscript𝑃xP_{\textup{{x}}} is symmetric about the origin (in which case   ψ(r,r)=  ψ(r,r)  𝜓𝑟𝑟  𝜓𝑟𝑟\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r)=\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)).

  • The map rc(r,Px)maps-to𝑟𝑐𝑟subscript𝑃xr\mapsto c(r,P_{\textup{{x}}}) is increasing on +subscript\mathbb{R}_{+}.

The proof of the above lemma can be found in the Appendix. We now state a useful interpolation bound on Φϵ(m;t)subscriptΦitalic-ϵ𝑚𝑡\Phi_{\epsilon}(m;t). This is a simpler version of the Guerra-Talagrand 𝟣𝖱𝖲𝖡1𝖱𝖲𝖡\mathsf{1RSB} interpolation bound at fixed overlap, a key invention that ultimately paved the way towards a proof of the Parisi formula (Guerra, , 2003; Talagrand, , 2006). In some sense, since we are dealing with a planted model, we only need a replica-symmetric version of this bound.

Proposition 11.

Fix m𝑚m\in\mathbb{R}, ϵ>0italic-ϵ0\epsilon>0, t[0,1]𝑡01t\in[0,1] and λ0𝜆0\lambda\geq 0. Let r=(1t)λq+tλ|m|𝑟1𝑡𝜆superscript𝑞𝑡𝜆𝑚r=(1-t)\lambda q^{*}+t\lambda|m|, r¯=(1t)λq+tλm¯𝑟1𝑡𝜆superscript𝑞𝑡𝜆𝑚\bar{r}=(1-t)\lambda q^{*}+t\lambda m. There exist constants K=K(Px)>0𝐾𝐾subscript𝑃x0K=K(P_{\textup{{x}}})>0, K=K(λ)>0superscript𝐾superscript𝐾𝜆0K^{\prime}=K^{\prime}(\lambda)>0 such that

Φϵ(m;t)subscriptΦitalic-ϵ𝑚𝑡\displaystyle\Phi_{\epsilon}(m;t)   ψ(r,r¯)tλm24K(ms  ψ(r,r¯))2+Kϵ2+KN.absent  𝜓𝑟¯𝑟𝑡𝜆superscript𝑚24𝐾superscript𝑚subscript𝑠  𝜓𝑟¯𝑟2superscript𝐾superscriptitalic-ϵ2superscript𝐾𝑁\displaystyle\leq\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}\big{(}r,\bar{r}\big{)}-\frac{t\lambda m^{2}}{4}-K\left(m-\partial_{s}\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}\big{(}r,\bar{r}\big{)}\right)^{2}+K^{\prime}\epsilon^{2}+\frac{K^{\prime}}{N}.
Proof.

To obtain a bound on Φϵ(m;t)subscriptΦitalic-ϵ𝑚𝑡\Phi_{\epsilon}(m;t) for any fixed t𝑡t, we use the interpolation method with Hamiltonian

Ht,s(𝒙)subscript𝐻𝑡𝑠𝒙\displaystyle-H_{t,s}(\bm{x}) :=i<jtsλ2Nxi2xj2+tsλNWijxixj+tsλNxixixjxjassignabsentsubscript𝑖𝑗𝑡𝑠𝜆2𝑁superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑗2𝑡𝑠𝜆𝑁subscript𝑊𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗𝑡𝑠𝜆𝑁subscript𝑥𝑖superscriptsubscript𝑥𝑖subscript𝑥𝑗superscriptsubscript𝑥𝑗\displaystyle:=\sum_{i<j}-\frac{ts\lambda}{2N}x_{i}^{2}x_{j}^{2}+\sqrt{\frac{ts\lambda}{N}}W_{ij}x_{i}x_{j}+\frac{ts\lambda}{N}x_{i}x_{i}^{*}x_{j}x_{j}^{*}
+i=1N(1t)λq2xi2+(1t)λqzixi+(1t)λqxixisuperscriptsubscript𝑖1𝑁1𝑡𝜆superscript𝑞2superscriptsubscript𝑥𝑖21𝑡𝜆superscript𝑞subscript𝑧𝑖subscript𝑥𝑖1𝑡𝜆superscript𝑞subscript𝑥𝑖superscriptsubscript𝑥𝑖\displaystyle~{}+\sum_{i=1}^{N}-\frac{(1-t)\lambda q^{*}}{2}x_{i}^{2}+\sqrt{(1-t)\lambda q^{*}}z_{i}x_{i}+(1-t)\lambda q^{*}x_{i}x_{i}^{*}
+i=1N(1s)tλ|m|2xi2+(1s)tλ|m|zixi+(1s)tλmxixi,superscriptsubscript𝑖1𝑁1𝑠𝑡𝜆𝑚2superscriptsubscript𝑥𝑖21𝑠𝑡𝜆𝑚superscriptsubscript𝑧𝑖subscript𝑥𝑖1𝑠𝑡𝜆𝑚subscript𝑥𝑖superscriptsubscript𝑥𝑖\displaystyle~{}+\sum_{i=1}^{N}-\frac{(1-s)t\lambda|m|}{2}x_{i}^{2}+\sqrt{(1-s)t\lambda|m|}z_{i}^{\prime}x_{i}+(1-s)t\lambda mx_{i}x_{i}^{*},

by varying s[0,1]𝑠01s\in[0,1]. The r.v.’s W,z,z𝑊𝑧superscript𝑧W,z,z^{\prime} are all i.i.d. standard Gaussians independent of everything else. We define

φ(t,s):=1N𝔼log𝟙{R1,[m,m+ϵ)}eHt,s(𝒙)dPxN(𝒙).assign𝜑𝑡𝑠1𝑁𝔼1subscript𝑅1𝑚𝑚italic-ϵsuperscript𝑒subscript𝐻𝑡𝑠𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙\varphi(t,s):=\frac{1}{N}\operatorname{\mathbb{E}}\log\int\mathds{1}\{R_{1,*}\in[m,m+\epsilon)\}e^{-H_{t,s}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}).

We compute the derivative w.r.t. s𝑠s. The same algebraic manipulations conducted in the computation of φsuperscript𝜑\varphi^{\prime} up to (29) apply here, and we get

sφ(t,s)=subscript𝑠𝜑𝑡𝑠absent\displaystyle\partial_{s}\varphi(t,s)= λt4𝔼(𝒙(1)𝒙(2)|m|)2t,s+λt4|m|2+λt4N2i=1N𝔼xi(1)2xi(2)2t,s\displaystyle-\frac{\lambda t}{4}\operatorname{\mathbb{E}}\left\langle(\bm{x}^{(1)}\cdot\bm{x}^{(2)}-|m|)^{2}\right\rangle_{t,s}+\frac{\lambda t}{4}|m|^{2}+\frac{\lambda t}{4N^{2}}\sum_{i=1}^{N}\operatorname{\mathbb{E}}\left\langle{x_{i}^{(1)}}^{2}{x_{i}^{(2)}}^{2}\right\rangle_{t,s}
+λt2𝔼(𝒙𝒙m)2t,sλt2m2λt2N2i=1N𝔼xi2xi2t,s,\displaystyle~{}~{}~{}+\frac{\lambda t}{2}\operatorname{\mathbb{E}}\left\langle(\bm{x}\cdot\bm{x}^{*}-m)^{2}\right\rangle_{t,s}-\frac{\lambda t}{2}m^{2}-\frac{\lambda t}{2N^{2}}\sum_{i=1}^{N}\operatorname{\mathbb{E}}\left\langle{x_{i}}^{2}{x_{i}^{*}}^{2}\right\rangle_{t,s},

where t,ssubscriptdelimited-⟨⟩𝑡𝑠\langle\cdot\rangle_{t,s} is the Gibbs average w.r.t. the Hamiltonian Ht,s(𝒙)+log𝟙{𝒙𝒙[m,m+ϵ)}subscript𝐻𝑡𝑠𝒙1𝒙superscript𝒙𝑚𝑚italic-ϵ-H_{t,s}(\bm{x})+\log\mathds{1}\{\bm{x}\cdot\bm{x}^{*}\in[m,m+\epsilon)\}. A few things now happen. Notice that the planted term (first term in the second line) is trivially smaller than tλϵ2/2𝑡𝜆superscriptitalic-ϵ22t\lambda\epsilon^{2}/2 due to the overlap restriction. Moreover, the last terms in both lines are of order 1/N1𝑁1/N since the variables xisubscript𝑥𝑖x_{i} are bounded. The first term in the first line, which involves the overlap between two replicas, is more challenging. What makes this term difficult to control is that the Gibbs measure t,ssubscriptdelimited-⟨⟩𝑡𝑠\langle\cdot\rangle_{t,s} no longer satisfies the Nishimori property due to the overlap restriction, so the overlap between two replicas no longer has the same distribution as the overlap of one replica with the planted spike. Fortunately, this term is always non-positive so we can ignore it altogether and obtain an upper bound:

sφ(t,s)λt4m2+λtϵ22+λKN.subscript𝑠𝜑𝑡𝑠𝜆𝑡4superscript𝑚2𝜆𝑡superscriptitalic-ϵ22𝜆𝐾𝑁\partial_{s}\varphi(t,s)\leq-\frac{\lambda t}{4}m^{2}+\frac{\lambda t\epsilon^{2}}{2}+\frac{\lambda K}{N}.

Integrating over s𝑠s, we get

Φϵ(m;t)φ(t,0)λt4m2+λtϵ22+λKN.subscriptΦitalic-ϵ𝑚𝑡𝜑𝑡0𝜆𝑡4superscript𝑚2𝜆𝑡superscriptitalic-ϵ22𝜆𝐾𝑁\Phi_{\epsilon}(m;t)\leq\varphi(t,0)-\frac{\lambda t}{4}m^{2}+\frac{\lambda t\epsilon^{2}}{2}+\frac{\lambda K}{N}.

Now it remains to show that

φ(t,0)  ψ(r,r¯)K(ms  ψ(r,r¯))2+𝒪(ϵ2).𝜑𝑡0  𝜓𝑟¯𝑟𝐾superscript𝑚subscript𝑠  𝜓𝑟¯𝑟2𝒪superscriptitalic-ϵ2\varphi(t,0)\leq\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,\bar{r})-K\left(m-\partial_{s}\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,\bar{r})\right)^{2}+\mathcal{O}(\epsilon^{2}).

By properties of the Gaussian distribution, we can write Ht,0=i=1Nrzixi+r¯xixir2xi2subscript𝐻𝑡0superscriptsubscript𝑖1𝑁𝑟subscript𝑧𝑖subscript𝑥𝑖¯𝑟subscript𝑥𝑖superscriptsubscript𝑥𝑖𝑟2superscriptsubscript𝑥𝑖2-H_{t,0}=\sum_{i=1}^{N}\sqrt{r}z_{i}x_{i}+\bar{r}x_{i}x_{i}^{*}-\frac{r}{2}x_{i}^{2}. Define the following (random) probability measure

G(A):=AeHt,0(𝒙)dPxN(𝒙)eHt,0(𝒙)dPxN(𝒙),assign𝐺𝐴subscript𝐴superscript𝑒subscript𝐻𝑡0𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙superscript𝑒subscript𝐻𝑡0𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙G(A):=\frac{\int_{A}e^{-H_{t,0}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})}{\int e^{-H_{t,0}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})},

for all Borel sets AN𝐴superscript𝑁A\subseteq\mathbb{R}^{N}. We observe that conditionally on the Gaussian vector 𝒛𝒛\bm{z} and the planted vector 𝒙superscript𝒙\bm{x^{*}}, G𝐺G is a product measure due to the additive form of Ht,0subscript𝐻𝑡0H_{t,0}. Moreover,

φ(t;0)  ψ(r,r¯)=1N𝔼logG({R1,[m,m+ϵ)}),𝜑𝑡0  𝜓𝑟¯𝑟1𝑁𝔼𝐺subscript𝑅1𝑚𝑚italic-ϵ\varphi(t;0)-\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,\bar{r})=\frac{1}{N}\operatorname{\mathbb{E}}\log G(\{R_{1,*}\in[m,m+\epsilon)\}),

so will be interested in a large deviation bound on the above quantity. The prior Pxsubscript𝑃xP_{\textup{{x}}} is of bounded support, thus the marginals of G𝐺G (conditional on 𝒙superscript𝒙\bm{x}^{*} and 𝒛𝒛\bm{z}) are clearly sub-Gaussian. Therefore, by concentration of measure, the empirical average 𝒙𝒙/N𝒙superscript𝒙𝑁\bm{x}\cdot\bm{x}^{*}/N must concentrate around its expectation: for all u0𝑢0u\geq 0

G({1Ni=1N(xi𝔼G[xi])xiu})eNu2/2K2,𝐺1𝑁superscriptsubscript𝑖1𝑁subscript𝑥𝑖subscript𝔼𝐺subscript𝑥𝑖subscriptsuperscript𝑥𝑖𝑢superscript𝑒𝑁superscript𝑢22superscript𝐾2G\left(\left\{\frac{1}{N}\sum_{i=1}^{N}(x_{i}-\operatorname{\mathbb{E}}_{G}[x_{i}])x^{*}_{i}\geq u\right\}\right)\leq e^{-Nu^{2}/2K^{2}},

where K𝐾K is for instance twice the diameter of the support of Pxsubscript𝑃xP_{\textup{{x}}}. This implies

1NlogG({R1,[m,m+ϵ)})(mq^)22K2𝟙{q^m}(m+ϵq^)22K2𝟙{q^m+ϵ},1𝑁𝐺subscript𝑅1𝑚𝑚italic-ϵsuperscript𝑚^𝑞22superscript𝐾21^𝑞𝑚superscript𝑚italic-ϵ^𝑞22superscript𝐾21^𝑞𝑚italic-ϵ\frac{1}{N}\log G(\{R_{1,*}\in[m,m+\epsilon)\})\leq-\frac{(m-\widehat{q})^{2}}{2K^{2}}\mathds{1}\{\widehat{q}\leq m\}-\frac{(m+\epsilon-\widehat{q})^{2}}{2K^{2}}\mathds{1}\{\widehat{q}\geq m+\epsilon\},

where q^=1Ni=1N𝔼G[xi]xi^𝑞1𝑁superscriptsubscript𝑖1𝑁subscript𝔼𝐺subscript𝑥𝑖subscriptsuperscript𝑥𝑖\widehat{q}=\frac{1}{N}\sum_{i=1}^{N}\operatorname{\mathbb{E}}_{G}[x_{i}]x^{*}_{i}. Now by Jensen’s inequality (x(xa)2𝟙{xa}maps-to𝑥superscript𝑥𝑎21𝑥𝑎x\mapsto(x-a)^{2}\mathds{1}\{x\leq a\} and x(xb)2𝟙{xb}maps-to𝑥superscript𝑥𝑏21𝑥𝑏x\mapsto(x-b)^{2}\mathds{1}\{x\geq b\} are convex), we can write

1N𝔼logG({R1,[m,m+ϵ)})1𝑁𝔼𝐺subscript𝑅1𝑚𝑚italic-ϵ\displaystyle\frac{1}{N}\operatorname{\mathbb{E}}\log G(\{R_{1,*}\in[m,m+\epsilon)\}) (m𝔼[q^])22K2𝟙{𝔼[q^]m}absentsuperscript𝑚𝔼^𝑞22superscript𝐾21𝔼^𝑞𝑚\displaystyle\leq-\frac{(m-\operatorname{\mathbb{E}}[\widehat{q}])^{2}}{2K^{2}}\mathds{1}\left\{\operatorname{\mathbb{E}}[\widehat{q}]\leq m\right\}
(m+ϵ𝔼[q^])22K2𝟙{𝔼[q^]m+ϵ}.superscript𝑚italic-ϵ𝔼^𝑞22superscript𝐾21𝔼^𝑞𝑚italic-ϵ\displaystyle~{}~{}-\frac{(m+\epsilon-\operatorname{\mathbb{E}}[\widehat{q}])^{2}}{2K^{2}}\mathds{1}\left\{\operatorname{\mathbb{E}}[\widehat{q}]\geq m+\epsilon\right\}.

Since

𝔼G[xi]=xerzix+r¯xxir2x2dPx(x)erzix+r¯xxir2x2dPx(x),subscript𝔼𝐺subscript𝑥𝑖𝑥superscript𝑒𝑟subscript𝑧𝑖𝑥¯𝑟𝑥superscriptsubscript𝑥𝑖𝑟2superscript𝑥2differential-dsubscript𝑃x𝑥superscript𝑒𝑟subscript𝑧𝑖𝑥¯𝑟𝑥superscriptsubscript𝑥𝑖𝑟2superscript𝑥2differential-dsubscript𝑃x𝑥\operatorname{\mathbb{E}}_{G}[x_{i}]=\frac{\int xe^{\sqrt{r}z_{i}x+\bar{r}xx_{i}^{*}-\frac{r}{2}x^{2}}\mathrm{d}P_{\textup{{x}}}(x)}{\int e^{\sqrt{r}z_{i}x+\bar{r}xx_{i}^{*}-\frac{r}{2}x^{2}}\mathrm{d}P_{\textup{{x}}}(x)},

we have 𝔼[q^]=sψ(r,r¯)𝔼^𝑞subscript𝑠𝜓𝑟¯𝑟\operatorname{\mathbb{E}}[\widehat{q}]=\partial_{s}\psi(r,\bar{r}). We now use the elementary inequality 12(xa)2(ab)2(xa)2𝟙{xa}+(xb)2𝟙{xb}12superscript𝑥𝑎2superscript𝑎𝑏2superscript𝑥𝑎21𝑥𝑎superscript𝑥𝑏21𝑥𝑏\frac{1}{2}(x-a)^{2}-(a-b)^{2}\leq(x-a)^{2}\mathds{1}\{x\leq a\}+(x-b)^{2}\mathds{1}\{x\geq b\}, valid for all x𝑥x\in\mathbb{R} and ab𝑎𝑏a\leq b, to simplify the above bound and obtain

1N𝔼logG({R1,[m,m+ϵ)})(ms  ψ(r,r¯))24K2+ϵ22K2.1𝑁𝔼𝐺subscript𝑅1𝑚𝑚italic-ϵsuperscript𝑚subscript𝑠  𝜓𝑟¯𝑟24superscript𝐾2superscriptitalic-ϵ22superscript𝐾2\frac{1}{N}\operatorname{\mathbb{E}}\log G(\{R_{1,*}\in[m,m+\epsilon)\})\leq-\frac{(m-\partial_{s}\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,\bar{r}))^{2}}{4K^{2}}+\frac{\epsilon^{2}}{2K^{2}}.

This allows us to conclude. \blacksquare

A consequence of the above proposition is an energy gap property: if m𝑚m is far from q(λ)superscript𝑞𝜆q^{*}(\lambda) then the free energy Φϵ(m;t)subscriptΦitalic-ϵ𝑚𝑡\Phi_{\epsilon}(m;t) of the configurations having overlap m𝑚m with 𝒙superscript𝒙\bm{x}^{*} is strictly smaller than ϕ𝖱𝖲(λ;t)subscriptitalic-ϕ𝖱𝖲𝜆𝑡\phi_{\mathsf{RS}}(\lambda;t):

Proposition 12.

For all λ𝒜𝜆𝒜\lambda\in\mathcal{A}, all ϵ>0italic-ϵ0\epsilon>0 and all t[0,1]𝑡01t\in[0,1], there exist constants c=c(λ,ϵ,t,Px)0𝑐𝑐𝜆italic-ϵ𝑡subscript𝑃x0c=c(\lambda,\epsilon,t,P_{\textup{{x}}})\geq 0 and ϵ=ϵ(λ,ϵ)>0superscriptitalic-ϵsuperscriptitalic-ϵ𝜆italic-ϵ0\epsilon^{\prime}=\epsilon^{\prime}(\lambda,\epsilon)>0 such that

m|mq(λ)|ϵΦϵ(m;t)ϕ𝖱𝖲(λ;t)c.formulae-sequencefor-all𝑚formulae-sequence𝑚superscript𝑞𝜆italic-ϵsubscriptΦsuperscriptitalic-ϵ𝑚𝑡subscriptitalic-ϕ𝖱𝖲𝜆𝑡𝑐\forall m\in\mathbb{R}\qquad|m-q^{*}(\lambda)|\geq\epsilon\qquad\implies\qquad\Phi_{\epsilon^{\prime}}(m;t)\leq\phi_{\mathsf{RS}}(\lambda;t)-c.

Moreover, if t<1𝑡1t<1 then c>0𝑐0c>0. If either λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} or Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric about the origin inft[0,1]c(t)>0subscriptinfimum𝑡01𝑐𝑡0\inf_{t\in[0,1]}c(t)>0. Lastly, if λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c} and Pxsubscript𝑃xP_{\textup{{x}}} is symmetric, then c(t)c0(1t)similar-to𝑐𝑡subscript𝑐01𝑡c(t)\sim c_{0}(1-t) as t1𝑡1t\to 1, for some c0=c0(λ,ϵ,Px)>0subscript𝑐0subscript𝑐0𝜆italic-ϵsubscript𝑃x0c_{0}=c_{0}(\lambda,\epsilon,P_{\textup{{x}}})>0.

A direct consequence of the above energy gap result is the convergence in probability of the overlaps:

Proposition 13.

For all λ𝒜𝜆𝒜\lambda\in\mathcal{A}, all ϵ>0italic-ϵ0\epsilon>0 and all t[0,1]𝑡01t\in[0,1], there exist constants K=K(λ,ϵ)0,c=c(λ,ϵ,t,Px)0formulae-sequence𝐾𝐾𝜆italic-ϵ0𝑐𝑐𝜆italic-ϵ𝑡subscript𝑃x0K=K(\lambda,\epsilon)\geq 0,~{}c=c(\lambda,\epsilon,t,P_{\textup{{x}}})\geq 0 such that

𝔼𝟙{|R1,q(λ)|ϵ}tKecN,\operatorname{\mathbb{E}}\left\langle\mathds{1}\left\{\big{|}R_{1,*}-q^{*}(\lambda)\big{|}\geq\epsilon\right\}\right\rangle_{t}\leq Ke^{-cN},

where the constant c𝑐c the same properties as in Proposition 12, except that c(t)c0(1t)2similar-to𝑐𝑡subscript𝑐0superscript1𝑡2c(t)\sim c_{0}(1-t)^{2} as t1𝑡1t\to 1. Moreover, if λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}, Pxsubscript𝑃xP_{\textup{{x}}} is symmetric and t=1𝑡1t=1 then one still has

𝔼𝟙{||R1,|q(λ)|ϵ}tKecN,𝔼subscriptdelimited-⟨⟩1subscript𝑅1superscript𝑞𝜆italic-ϵ𝑡𝐾superscript𝑒𝑐𝑁\operatorname{\mathbb{E}}\left\langle\mathds{1}\left\{\big{|}|R_{1,*}|-q^{*}(\lambda)\big{|}\geq\epsilon\right\}\right\rangle_{t}\leq Ke^{-cN},

with c=c(λ,ϵ,Px)>0𝑐𝑐𝜆italic-ϵsubscript𝑃x0c=c(\lambda,\epsilon,P_{\textup{{x}}})>0.

To prove the above lemma, we first show that the partition function of the model enjoys sub-Gaussian concentration in logarithmic scale. This is an elementary consequence of two classical concentration-of-measure results: concentration of Lipschitz functions of Gaussian random variables, and concentration of convex Lipschitz function of bounded random variables. See Boucheron et al., (2013) and van Handel, (2014).

Lemma 14.

Let A𝐴A be a Borel subset of Nsuperscript𝑁\mathbb{R}^{N}, and define the random variable

Z:=AeHt(𝒙)dPxN(𝒙).assign𝑍subscript𝐴superscript𝑒subscript𝐻𝑡𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙Z:=\int_{A}e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}).

There exist a constant K>0𝐾0K>0 depending only on λ𝜆\lambda and Pxsubscript𝑃xP_{\textup{{x}}} such that for all u0𝑢0u\geq 0,

Pr(|1NlogZ1N𝔼logZ|u)4eNu2/K.Pr1𝑁𝑍1𝑁𝔼𝑍𝑢4superscript𝑒𝑁superscript𝑢2𝐾\Pr\left(\left|\frac{1}{N}\log Z-\frac{1}{N}\operatorname{\mathbb{E}}\log Z\right|\geq u\right)\leq 4e^{-Nu^{2}/K}.

Proof of Lemma 14. It suffices to notice that the map (𝑾,𝒛)1NlogZ(𝑾,𝒛)maps-to𝑾𝒛1𝑁𝑍𝑾𝒛(\bm{W},\bm{z})\mapsto\frac{1}{N}\log Z(\bm{W},\bm{z}) is Lipschitz with constant KλN𝐾𝜆𝑁K\sqrt{\frac{\lambda}{N}} for every 𝒙Nsuperscript𝒙superscript𝑁\bm{x}^{*}\in\mathbb{R}^{N}, and that the map 𝒙1N𝔼[logZ|𝒙]maps-tosuperscript𝒙1𝑁𝔼conditional𝑍superscript𝒙\bm{x}^{*}\mapsto\frac{1}{N}\operatorname{\mathbb{E}}[\log Z|\bm{x^{*}}] (where the expectation is over 𝑾,𝒛𝑾𝒛\bm{W},\bm{z}) is convex and Lipschitz with constant KλN𝐾𝜆𝑁K\frac{\lambda}{\sqrt{N}}. Then we use concentration of Lipschitz functions of Gaussian r.v.’s and of convex Lipschitz functions of bounded r.v.’s (since the coordinates xisubscriptsuperscript𝑥𝑖x^{*}_{i} are bounded).  

Proof of Proposition 13. For ϵ,ϵ>0,t[0,1)formulae-sequenceitalic-ϵsuperscriptitalic-ϵ0𝑡01\epsilon,\epsilon^{\prime}>0,~{}t\in[0,1), we can write the decomposition

𝔼𝟙{|R1,q(λ)|ϵ}t\displaystyle\operatorname{\mathbb{E}}\left\langle\mathds{1}\left\{|R_{1,*}-q^{*}(\lambda)|\geq\epsilon\right\}\right\rangle_{t} =l0𝔼𝟙{R1,qϵ[lϵ,(l+1)ϵ)}t\displaystyle=\sum_{l\geq 0}\operatorname{\mathbb{E}}\left\langle\mathds{1}\big{\{}R_{1,*}-q^{*}-\epsilon\in[l\epsilon^{\prime},(l+1)\epsilon^{\prime})\big{\}}\right\rangle_{t}
+l0𝔼𝟙{R1,+qϵ[lϵ,(l+1)ϵ)}t,\displaystyle~{}+\sum_{l\geq 0}\operatorname{\mathbb{E}}\left\langle\mathds{1}\big{\{}-R_{1,*}+q^{*}-\epsilon\in[l\epsilon^{\prime},(l+1)\epsilon^{\prime})\big{\}}\right\rangle_{t},

where the integer index l𝑙l ranges over a finite set of size K/ϵabsent𝐾superscriptitalic-ϵ\leq K/\epsilon^{\prime} since the prior Pxsubscript𝑃xP_{\textup{{x}}} has bounded support. We will only treat the first sum in the above expression since the argument extends trivially to the second sum. Let A={R1,qϵ[lϵ,(l+1)ϵ)}𝐴subscript𝑅1superscript𝑞italic-ϵ𝑙superscriptitalic-ϵ𝑙1superscriptitalic-ϵA=\big{\{}R_{1,*}-q^{*}-\epsilon\in[l\epsilon^{\prime},(l+1)\epsilon^{\prime})\big{\}} and write

𝔼𝟙(A)t=𝔼[AeHt(𝒙)dPxN(𝒙)eHt(𝒙)dPxN(𝒙)].\operatorname{\mathbb{E}}\left\langle\mathds{1}(A)\right\rangle_{t}=\operatorname{\mathbb{E}}\left[\frac{\int_{A}e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})}{\int e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})}\right]. (35)

In virtue of Lemma 14 the two quantities in the above fraction enjoy sub-Gaussian concentration in logarithmic scale. For any given l𝑙l and u0𝑢0u\geq 0, we simultaneously have

1NlogeHt(𝒙)dPxN(𝒙)1𝑁superscript𝑒subscript𝐻𝑡𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙\displaystyle\frac{1}{N}\log\int e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}) 1N𝔼logeHt(𝒙)dPxN(𝒙)uabsent1𝑁𝔼superscript𝑒subscript𝐻𝑡𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙𝑢\displaystyle\geq\frac{1}{N}\operatorname{\mathbb{E}}\log\int e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})-u
ϕ𝖱𝖲(λ;t)KtλNu,absentsubscriptitalic-ϕ𝖱𝖲𝜆𝑡𝐾𝑡𝜆𝑁𝑢\displaystyle\geq\phi_{\mathsf{RS}}(\lambda;t)-\frac{Kt\lambda}{N}-u,

(the last inequality come from (33)), and

1NlogAeHt(𝒙)dPxN(𝒙)1𝑁subscript𝐴superscript𝑒subscript𝐻𝑡𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙\displaystyle\frac{1}{N}\log\int_{A}e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}) 1N𝔼logAeHt(𝒙)dPxN(𝒙)+uabsent1𝑁𝔼subscript𝐴superscript𝑒subscript𝐻𝑡𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙𝑢\displaystyle\leq\frac{1}{N}\operatorname{\mathbb{E}}\log\int_{A}e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})+u
=Φϵ(q+ϵ+lϵ;t)+u,absentsubscriptΦsuperscriptitalic-ϵsuperscript𝑞italic-ϵ𝑙superscriptitalic-ϵ𝑡𝑢\displaystyle=\Phi_{\epsilon^{\prime}}(q^{*}+\epsilon+l\epsilon^{\prime};t)+u,

with probability at least 14eNu2/K14superscript𝑒𝑁superscript𝑢2𝐾1-4e^{-Nu^{2}/K}. On the complement of this event event, we simply bound the fraction in (35) by 1. Combining the above bounds we have

𝔼𝟙(A)t4eNu2/K+eN(δ+2u),\operatorname{\mathbb{E}}\left\langle\mathds{1}(A)\right\rangle_{t}\leq 4e^{-Nu^{2}/K}+e^{N(\delta+2u)},

where δ=Φϵ(m;t)ϕ𝖱𝖲(λ;t)+Kλt/N𝛿subscriptΦsuperscriptitalic-ϵ𝑚𝑡subscriptitalic-ϕ𝖱𝖲𝜆𝑡𝐾𝜆𝑡𝑁\delta=\Phi_{\epsilon^{\prime}}(m;t)-\phi_{\mathsf{RS}}(\lambda;t)+K\lambda t/N with m=q+ϵ+lϵ𝑚superscript𝑞italic-ϵ𝑙superscriptitalic-ϵm=q^{*}+\epsilon+l\epsilon^{\prime}. We let ϵsuperscriptitalic-ϵ\epsilon^{\prime} be a function of λ𝜆\lambda and ϵitalic-ϵ\epsilon as dictated by Proposition 12, and c>0𝑐0c>0 such that δc𝛿𝑐\delta\leq-c for all m𝑚m such that |mq|ϵ𝑚superscript𝑞italic-ϵ|m-q^{*}|\geq\epsilon. Now we conclude by letting u=δ/3𝑢𝛿3u=-\delta/3. Finally, if Pxsubscript𝑃xP_{\textup{{x}}} is symmetric and t=1𝑡1t=1, then it suffices to consider non-negative values of m𝑚m in the above argument to prove the corresponding statement.  

Proof of Propopsition 12. The gap we seek to prove will come from different sources, depending on the particular cases we will look at. The treatment will be split into several nested cases, depending on whether t𝑡t is small or not and whether m𝑚m positive and negative.

Large t𝑡t.

Assume tt0𝑡subscript𝑡0t\geq t_{0} to be determined later. For m0𝑚0m\geq 0, Proposition 11 implies

Φϵ(m;t)ψ((1t)λq+tλm)tλm24+Kϵ2+KN.subscriptΦsuperscriptitalic-ϵ𝑚𝑡𝜓1𝑡𝜆superscript𝑞𝑡𝜆𝑚𝑡𝜆superscript𝑚24superscript𝐾superscriptitalic-ϵ2superscript𝐾𝑁\Phi_{\epsilon^{\prime}}(m;t)\leq\psi((1-t)\lambda q^{*}+t\lambda m)-\frac{t\lambda m^{2}}{4}+K^{\prime}\epsilon^{\prime 2}+\frac{K^{\prime}}{N}.

Since ψ𝜓\psi is a convex function we have

ψ((1t)λq+tλm)tλm24𝜓1𝑡𝜆superscript𝑞𝑡𝜆𝑚𝑡𝜆superscript𝑚24\displaystyle\psi((1-t)\lambda q^{*}+t\lambda m)-\frac{t\lambda m^{2}}{4} (1t)ψ(λq)+tψ(λm)tλm24absent1𝑡𝜓𝜆superscript𝑞𝑡𝜓𝜆𝑚𝑡𝜆superscript𝑚24\displaystyle\leq(1-t)\psi(\lambda q^{*})+t\psi(\lambda m)-\frac{t\lambda m^{2}}{4}
=(1t)ψ(λq)+tF(λ,m).absent1𝑡𝜓𝜆superscript𝑞𝑡𝐹𝜆𝑚\displaystyle=(1-t)\psi(\lambda q^{*})+tF(\lambda,m). (36)

Since q(λ)superscript𝑞𝜆q^{*}(\lambda) is the unique maximizer of mF(λ,m)maps-to𝑚𝐹𝜆𝑚m\mapsto F(\lambda,m), |mq|ϵ>0𝑚superscript𝑞italic-ϵ0|m-q^{*}|\geq\epsilon>0 implies that F(λ,m)F(λ,q)c(ϵ)𝐹𝜆𝑚𝐹𝜆superscript𝑞𝑐italic-ϵF(\lambda,m)\leq F(\lambda,q^{*})-c(\epsilon) for some c(ϵ)>0𝑐italic-ϵ0c(\epsilon)>0. This in turn implies

Φϵ(m;t)subscriptΦsuperscriptitalic-ϵ𝑚𝑡\displaystyle\Phi_{\epsilon^{\prime}}(m;t) (1t)ψ(λq)+t(ψ(λq)λq24c(ϵ))+Kϵ2+KNabsent1𝑡𝜓𝜆superscript𝑞𝑡𝜓𝜆superscript𝑞𝜆superscript𝑞absent24𝑐italic-ϵsuperscript𝐾superscriptitalic-ϵ2superscript𝐾𝑁\displaystyle\leq(1-t)\psi(\lambda q^{*})+t\big{(}\psi(\lambda q^{*})-\frac{\lambda q^{*2}}{4}-c(\epsilon)\big{)}+K^{\prime}\epsilon^{\prime 2}+\frac{K^{\prime}}{N}
=ψ(λq)tλq24tc(ϵ)+Kϵ2+KNabsent𝜓𝜆superscript𝑞𝑡𝜆superscript𝑞absent24𝑡𝑐italic-ϵsuperscript𝐾superscriptitalic-ϵ2superscript𝐾𝑁\displaystyle=\psi(\lambda q^{*})-\frac{t\lambda q^{*2}}{4}-tc(\epsilon)+K^{\prime}\epsilon^{\prime 2}+\frac{K^{\prime}}{N}
ϕ𝖱𝖲(λ;t)t0c(ϵ)2+KN,absentsubscriptitalic-ϕ𝖱𝖲𝜆𝑡subscript𝑡0𝑐italic-ϵ2superscript𝐾𝑁\displaystyle\leq\phi_{\mathsf{RS}}(\lambda;t)-\frac{t_{0}c(\epsilon)}{2}+\frac{K^{\prime}}{N},

where we have chosen ϵsuperscriptitalic-ϵ\epsilon^{\prime} such that Kϵ2<t0c(ϵ)/2superscript𝐾superscriptitalic-ϵ2subscript𝑡0𝑐italic-ϵ2K^{\prime}\epsilon^{\prime 2}<t_{0}c(\epsilon)/2. The conclusion is reached for m0𝑚0m\geq 0. Now we would like to prove the same bound on ΦϵsubscriptΦsuperscriptitalic-ϵ\Phi_{\epsilon^{\prime}} for negative overlaps. Proposition 11 implies that for m>0𝑚0m>0,

Φϵ(m;t)  ψ((1t)λq+tλm,(1t)λqtλm)tλm24+Kϵ2+KN.subscriptΦsuperscriptitalic-ϵ𝑚𝑡  𝜓1𝑡𝜆superscript𝑞𝑡𝜆𝑚1𝑡𝜆superscript𝑞𝑡𝜆𝑚𝑡𝜆superscript𝑚24superscript𝐾superscriptitalic-ϵ2superscript𝐾𝑁\Phi_{\epsilon^{\prime}}(-m;t)\leq\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}((1-t)\lambda q^{*}+t\lambda m,(1-t)\lambda q^{*}-t\lambda m)-\frac{t\lambda m^{2}}{4}+K^{\prime}\epsilon^{\prime 2}+\frac{K^{\prime}}{N}. (37)

If λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} then q(λ)=0superscript𝑞𝜆0q^{*}(\lambda)=0, and by Proposition 10 we have   ψ(tλm,tλm)tλm24ψ(tλm)tλm24t0F(λ,m)  𝜓𝑡𝜆𝑚𝑡𝜆𝑚𝑡𝜆superscript𝑚24𝜓𝑡𝜆𝑚𝑡𝜆superscript𝑚24subscript𝑡0𝐹𝜆𝑚\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(t\lambda m,-t\lambda m)-\frac{t\lambda m^{2}}{4}\leq\psi(t\lambda m)-\frac{t\lambda m^{2}}{4}\leq t_{0}F(\lambda,m), and we finish the argument as in the case of positive overlap. Now we deal with the case λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}.

  • Suppose |mq|ϵ𝑚superscript𝑞italic-ϵ|m-q^{*}|\geq\epsilon. We let r=(1t)λq+tλm𝑟1𝑡𝜆superscript𝑞𝑡𝜆𝑚r=(1-t)\lambda q^{*}+t\lambda m and α=(1t)q(1t)q+tm𝛼1𝑡superscript𝑞1𝑡superscript𝑞𝑡𝑚\alpha=\frac{(1-t)q^{*}}{(1-t)q^{*}+tm}.

      ψ((1t)λq+tλm,(1t)λqtλm)  𝜓1𝑡𝜆superscript𝑞𝑡𝜆𝑚1𝑡𝜆superscript𝑞𝑡𝜆𝑚\displaystyle\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}((1-t)\lambda q^{*}+t\lambda m,(1-t)\lambda q^{*}-t\lambda m) =  ψ(r,αr(1α)r)absent  𝜓𝑟𝛼𝑟1𝛼𝑟\displaystyle=\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,\alpha r-(1-\alpha)r)
    α  ψ(r,r)+(1α)  ψ(r,r)absent𝛼  𝜓𝑟𝑟1𝛼  𝜓𝑟𝑟\displaystyle\leq\alpha\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)+(1-\alpha)\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r)
    ψ(r)absent𝜓𝑟\displaystyle\leq\psi(r)

    where the last two lines follow from Proposition 10. Since |mq|ϵ𝑚superscript𝑞italic-ϵ|m-q^{*}|\geq\epsilon we finish once again as in the positive overlap case, starting from line (4.2).

  • Suppose |mq|ϵ𝑚superscript𝑞italic-ϵ|m-q^{*}|\leq\epsilon. Then 1αt0qϵq+ϵ1𝛼subscript𝑡0superscript𝑞italic-ϵsuperscript𝑞italic-ϵ1-\alpha\geq t_{0}\frac{q^{*}-\epsilon}{q^{*}+\epsilon} and |rλq|λϵ𝑟𝜆superscript𝑞𝜆italic-ϵ|r-\lambda q^{*}|\leq\lambda\epsilon. If Pxsubscript𝑃xP_{\textup{{x}}} is asymmetric we use the bounds of Proposition 10:

      ψ((1t)λq+tλm,(1t)λqtλm)  𝜓1𝑡𝜆superscript𝑞𝑡𝜆𝑚1𝑡𝜆superscript𝑞𝑡𝜆𝑚\displaystyle\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}((1-t)\lambda q^{*}+t\lambda m,(1-t)\lambda q^{*}-t\lambda m) α  ψ(r,r)+(1α)  ψ(r,r)absent𝛼  𝜓𝑟𝑟1𝛼  𝜓𝑟𝑟\displaystyle\leq\alpha\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)+(1-\alpha)\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r)
    ψ(r)(1α)c(r)absent𝜓𝑟1𝛼𝑐𝑟\displaystyle\leq\psi(r)-(1-\alpha)c(r)
    ψ(r)t0qϵq+ϵc(λ(qϵ)).absent𝜓𝑟subscript𝑡0superscript𝑞italic-ϵsuperscript𝑞italic-ϵ𝑐𝜆superscript𝑞italic-ϵ\displaystyle\leq\psi(r)-t_{0}\frac{q^{*}-\epsilon}{q^{*}+\epsilon}c(\lambda(q^{*}-\epsilon)).

    The last line follows since rc(r)maps-to𝑟𝑐𝑟r\mapsto c(r) is increasing. Then we finish the argument by plugging this bound in (37).

Small t𝑡t.

Assume now that tt0𝑡subscript𝑡0t\leq t_{0}. In this situation, we draw the gap from the term (msψ¯(r,r¯))2superscript𝑚subscript𝑠¯𝜓𝑟¯𝑟2(m-\partial_{s}\bar{\psi}(r,\bar{r}))^{2} (so far unused) in Proposition 11. The functions ψ¯(r,)¯𝜓𝑟\bar{\psi}(r,\cdot) and ψ¯(,)=ψ()¯𝜓𝜓\bar{\psi}(\cdot,\cdot)=\psi(\cdot) have bounded second derivatives so

max{|sψ¯(r,r¯)sψ¯(r,r)|,|sψ¯(r,r)sψ¯(q,q)|}Kλt0.subscript𝑠¯𝜓𝑟¯𝑟subscript𝑠¯𝜓𝑟𝑟subscript𝑠¯𝜓𝑟𝑟subscript𝑠¯𝜓superscript𝑞superscript𝑞𝐾𝜆subscript𝑡0\max~{}\big{\{}\left|\partial_{s}\bar{\psi}(r,\bar{r})-\partial_{s}\bar{\psi}(r,r)\right|~{}~{},~{}~{}\left|\partial_{s}\bar{\psi}(r,r)-\partial_{s}\bar{\psi}(q^{*},q^{*})\right|\big{\}}~{}~{}\leq K\lambda t_{0}.

Moreover,

(mq)22(msψ¯(r,r¯))2+2(qsψ¯(r,r¯))2.superscript𝑚superscript𝑞22superscript𝑚subscript𝑠¯𝜓𝑟¯𝑟22superscriptsuperscript𝑞subscript𝑠¯𝜓𝑟¯𝑟2(m-q^{*})^{2}\leq 2(m-\partial_{s}\bar{\psi}(r,\bar{r}))^{2}+2(q^{*}-\partial_{s}\bar{\psi}(r,\bar{r}))^{2}.

Since sψ¯(q,q)=qsubscript𝑠¯𝜓superscript𝑞superscript𝑞superscript𝑞\partial_{s}\bar{\psi}(q^{*},q^{*})=q^{*} we have

(msψ¯(r,r¯))212(mq)2Kλ2t02ϵ22Kλ2t02,superscript𝑚subscript𝑠¯𝜓𝑟¯𝑟212superscript𝑚superscript𝑞2𝐾superscript𝜆2superscriptsubscript𝑡02superscriptitalic-ϵ22𝐾superscript𝜆2superscriptsubscript𝑡02(m-\partial_{s}\bar{\psi}(r,\bar{r}))^{2}\geq\frac{1}{2}(m-q^{*})^{2}-K\lambda^{2}t_{0}^{2}\geq\frac{\epsilon^{2}}{2}-K\lambda^{2}t_{0}^{2},

and here we choose t0subscript𝑡0t_{0} to be accordingly small, and we finish the argument.

Note that the assumption that Pxsubscript𝑃xP_{\textup{{x}}} is not symmetric about the origin is used only in the case where the (negative) overlap m𝑚-m is close to qsuperscript𝑞-q^{*}. Consequently, the gap is independent of t𝑡t in all cases. Alternatively, without this asymmetry assumption (and when q>0superscript𝑞0q^{*}>0), we see that there is no hope of a gap independent of t𝑡t since the potential Φϵ(m;t)subscriptΦsuperscriptitalic-ϵ𝑚𝑡\Phi_{\epsilon^{\prime}}(m;t) is closer and closer to being even as t1𝑡1t\to 1. But we can still obtain a gap that depends on t(1t)𝑡1𝑡t(1-t) via a strong convexity argument.

The s  ψ(r,s)maps-to𝑠  𝜓𝑟𝑠s\mapsto\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,s) is strongly convex on any interval, and for all r0𝑟0r\geq 0. Therefore, recalling r=(1t)λq+tλm𝑟1𝑡𝜆superscript𝑞𝑡𝜆𝑚r=(1-t)\lambda q^{*}+t\lambda m and α=(1t)q(1t)q+tm𝛼1𝑡superscript𝑞1𝑡superscript𝑞𝑡𝑚\alpha=\frac{(1-t)q^{*}}{(1-t)q^{*}+tm}, there exists a constant c>0𝑐0c>0 depending only on λ𝜆\lambda and Pxsubscript𝑃xP_{\textup{{x}}} (this constant is a bound on r𝑟r) such that

  ψ((1t)λq+tλm,(1t)λqtλm)  𝜓1𝑡𝜆superscript𝑞𝑡𝜆𝑚1𝑡𝜆superscript𝑞𝑡𝜆𝑚\displaystyle\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}((1-t)\lambda q^{*}+t\lambda m,(1-t)\lambda q^{*}-t\lambda m) =  ψ(r,αr(1α)r)absent  𝜓𝑟𝛼𝑟1𝛼𝑟\displaystyle=\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,\alpha r-(1-\alpha)r)
α  ψ(r,r)+(1α)  ψ(r,r)c2α(1α)(2r)2absent𝛼  𝜓𝑟𝑟1𝛼  𝜓𝑟𝑟𝑐2𝛼1𝛼superscript2𝑟2\displaystyle\leq\alpha\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)+(1-\alpha)\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r)-\frac{c}{2}\alpha(1-\alpha)(2r)^{2}
=α  ψ(r,r)+(1α)  ψ(r,r)2ct(1t)λ2qmabsent𝛼  𝜓𝑟𝑟1𝛼  𝜓𝑟𝑟2𝑐𝑡1𝑡superscript𝜆2superscript𝑞𝑚\displaystyle=\alpha\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)+(1-\alpha)\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r)-2ct(1-t)\lambda^{2}q^{*}m
  ψ(r,r)2ct(1t)λ2q(qϵ),absent  𝜓𝑟𝑟2𝑐𝑡1𝑡superscript𝜆2superscript𝑞superscript𝑞italic-ϵ\displaystyle\leq\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)-2ct(1-t)\lambda^{2}q^{*}(q^{*}-\epsilon),

where the last bounds follows from   ψ(r,r)  ψ(r,r)  𝜓𝑟𝑟  𝜓𝑟𝑟\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r)\leq\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r) and |mq|ϵ𝑚superscript𝑞italic-ϵ|m-q^{*}|\leq\epsilon (recall that this is the only case where such an argument is needed).  

5 The cavity method

Now that we have established the convergence in probability of R1,subscript𝑅1R_{1,*} to q(λ)superscript𝑞𝜆q^{*}(\lambda) under 𝔼t\operatorname{\mathbb{E}}\langle\cdot\rangle_{t} in Lemma 13, we use the cavity method to prove the convergence of the moments of the overlap. In its essence, the cavity method amounts to isolating one variable from the system and analyzing the influence of the rest of the variables on it. It was initially introduced as an analytic tool, alternative to the replica method, to solve certain models of spin glasses (Mézard et al., , 1990), and has since been tremendously successful in predicting the behavior of many mean-field models. The underlying principle is know as the leave-one-out method in statistics. In our setting, this principle is materialized in the form of an interpolation method (yet again) that separates the last variable from the rest.

Our proofs of Theorems 7 and 8 are interlaced. The skeleton of the argument is as follows:

  1. 1.

    We first prove convergence of the second moment: 𝔼(R1,q)2t𝒪(1/N+ec(t)N)\operatorname{\mathbb{E}}\left\langle(R_{1,*}-q^{*})^{2}\right\rangle_{t}\leq\mathcal{O}(1/N+e^{-c(t)N}).

  2. 2.

    We then deduce from 1. the convergence of the fourth moment via an inductive argument: 𝔼(R1,q)4t𝒪(1/N2+ec(t)N)\operatorname{\mathbb{E}}\left\langle(R_{1,*}-q^{*})^{4}\right\rangle_{t}\leq\mathcal{O}(1/N^{2}+e^{-c(t)N}). This finishes the proof of Theorem 7.

  3. 3.

    Using 2., we revisit our proof of 1., and refine the estimates in order to obtain the sharper result N𝔼(R1,q)2tΔ𝖱𝖲(λ;t)N\cdot\operatorname{\mathbb{E}}\left\langle(R_{1,*}-q^{*})^{2}\right\rangle_{t}\to\Delta_{\mathsf{RS}}(\lambda;t). This finishes the proof of Theorem 8.

We will start by defining our interpolating Hamiltonian and state some preliminary bounds and properties. Then we will move on to the execution of the cavity computations.

5.1 Preliminary bounds

In this section the parameter t[0,1]𝑡01t\in[0,1] is fixed and treated as a constant. We consider the Hamiltonian

Ht(𝒙)superscriptsubscript𝐻𝑡𝒙\displaystyle-H_{t}^{-}(\bm{x}) :=i<jN1λt2Nxi2xj2+λtNWijxixj+λtNxixixjxjassignabsentsubscript𝑖𝑗𝑁1𝜆𝑡2𝑁superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑗2𝜆𝑡𝑁subscript𝑊𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗𝜆𝑡𝑁subscript𝑥𝑖superscriptsubscript𝑥𝑖subscript𝑥𝑗superscriptsubscript𝑥𝑗\displaystyle:=\sum_{i<j\leq N-1}-\frac{\lambda t}{2N}x_{i}^{2}x_{j}^{2}+\sqrt{\frac{\lambda t}{N}}W_{ij}x_{i}x_{j}+\frac{\lambda t}{N}x_{i}x_{i}^{*}x_{j}x_{j}^{*}
+i=1N1(1t)r2xi2+(1t)rzixi+(1t)rxixi,superscriptsubscript𝑖1𝑁11𝑡𝑟2superscriptsubscript𝑥𝑖21𝑡𝑟subscript𝑧𝑖subscript𝑥𝑖1𝑡𝑟subscript𝑥𝑖superscriptsubscript𝑥𝑖\displaystyle~{}+\sum_{i=1}^{N-1}-\frac{(1-t)r}{2}x_{i}^{2}+\sqrt{(1-t)r}z_{i}x_{i}+(1-t)rx_{i}x_{i}^{*},

where we have striped away the contribution of the variable xNsubscript𝑥𝑁x_{N} from Htsubscript𝐻𝑡H_{t} (equ. (22)). This contribution is considered separately: for t[0,1]superscript𝑡01t^{\prime}\in[0,1], we let

ht(𝒙)subscriptsuperscript𝑡𝒙\displaystyle-h_{t^{\prime}}(\bm{x}) :=i=1N1λt2Nxi2xN2+λtNWiNxixN+λtNxixixNxNassignabsentsuperscriptsubscript𝑖1𝑁1𝜆superscript𝑡2𝑁superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑁2𝜆superscript𝑡𝑁subscript𝑊𝑖𝑁subscript𝑥𝑖subscript𝑥𝑁𝜆superscript𝑡𝑁subscript𝑥𝑖superscriptsubscript𝑥𝑖subscript𝑥𝑁superscriptsubscript𝑥𝑁\displaystyle:=\sum_{i=1}^{N-1}-\frac{\lambda t^{\prime}}{2N}x_{i}^{2}x_{N}^{2}+\sqrt{\frac{\lambda t^{\prime}}{N}}W_{iN}x_{i}x_{N}+\frac{\lambda t^{\prime}}{N}x_{i}x_{i}^{*}x_{N}x_{N}^{*}
(1t)r2xN2+(1t)rzNxN+(1t)rxNxN.1superscript𝑡𝑟2superscriptsubscript𝑥𝑁21superscript𝑡𝑟subscript𝑧𝑁subscript𝑥𝑁1superscript𝑡𝑟subscript𝑥𝑁superscriptsubscript𝑥𝑁\displaystyle~{}~{}~{}-\frac{(1-t^{\prime})r}{2}x_{N}^{2}+\sqrt{(1-t^{\prime})r}z_{N}x_{N}+(1-t^{\prime})rx_{N}x_{N}^{*}.

We let r=λq(λ)𝑟𝜆superscript𝑞𝜆r=\lambda q^{*}(\lambda) and let our interpolation, with the time parameter s[0,1]𝑠01s\in[0,1], be

Ht,s(𝒙):=Ht(𝒙)+hts(𝒙).assignsubscript𝐻𝑡𝑠𝒙superscriptsubscript𝐻𝑡𝒙subscript𝑡𝑠𝒙H_{t,s}(\bm{x}):=H_{t}^{-}(\bm{x})+h_{ts}(\bm{x}).

At s=1𝑠1s=1 we have Ht,s=Htsubscript𝐻𝑡𝑠subscript𝐻𝑡H_{t,s}=H_{t}, and at s=0𝑠0s=0 the variable xNsubscript𝑥𝑁x_{N} decouples from the rest of the variables. For an integer n1𝑛1n\geq 1 and f:(N)n+1:𝑓maps-tosuperscriptsuperscript𝑁𝑛1f:(\mathbb{R}^{N})^{n+1}\mapsto\mathbb{R}, we define

f(𝒙(1),,𝒙(n),𝒙)t,s:=f(𝒙(1),,𝒙(n),𝒙)l=1neHt,s(𝒙(l))dPxN(𝒙(l))l=1neHt,s(𝒙(l))dPxN(𝒙(l)),assignsubscriptdelimited-⟨⟩𝑓superscript𝒙1superscript𝒙𝑛superscript𝒙𝑡𝑠𝑓superscript𝒙1superscript𝒙𝑛superscript𝒙superscriptsubscriptproduct𝑙1𝑛superscript𝑒subscript𝐻𝑡𝑠superscript𝒙𝑙dsuperscriptsubscript𝑃xtensor-productabsent𝑁superscript𝒙𝑙superscriptsubscriptproduct𝑙1𝑛superscript𝑒subscript𝐻𝑡𝑠superscript𝒙𝑙dsuperscriptsubscript𝑃xtensor-productabsent𝑁superscript𝒙𝑙\left\langle f(\bm{x}^{(1)},\cdots,\bm{x}^{(n)},\bm{x}^{*})\right\rangle_{t,s}:=\frac{\int f(\bm{x}^{(1)},\cdots,\bm{x}^{(n)},\bm{x}^{*})\prod_{l=1}^{n}e^{-H_{t,s}(\bm{x}^{(l)})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}^{(l)})}{\int\prod_{l=1}^{n}e^{-H_{t,s}(\bm{x}^{(l)})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x}^{(l)})},

similarly to (21). Following Talagrand’s notation, we write

Rl,l=1Ni=1N1xi(l)xi(l),andνs(f)=𝔼ft,s.\displaystyle R^{-}_{l,l^{\prime}}=\frac{1}{N}\sum_{i=1}^{N-1}x_{i}^{(l)}x_{i}^{(l^{\prime})},\quad\mbox{and}\quad\nu_{s}(f)=\operatorname{\mathbb{E}}\langle f\rangle_{t,s}.

In our last notation, we have only emphasized the dependence of the average on s𝑠s; the parameter t𝑡t will henceforth remain fixed. Moreover, we write ν(f)𝜈𝑓\nu(f) for ν1(f)subscript𝜈1𝑓\nu_{1}(f). The following three lemmas are variants of Lemma 1.6.3, Lemma 1.6.4 and Proposition 1.8.1 respectively in (Talagrand, 2011a, ).

Lemma 15.

For all n1𝑛1n\geq 1,

ddsνs(f)dd𝑠subscript𝜈𝑠𝑓\displaystyle\frac{\mathrm{d}}{\mathrm{d}s}\nu_{s}(f) =λt21llnνs((Rl,lq)y(l)y(l)f)λtnl=1nνs(Rl,n+1q)y(l)y(n+1)f)\displaystyle=\frac{\lambda t}{2}\sum_{1\leq l\neq l^{\prime}\leq n}\nu_{s}((R^{-}_{l,l^{\prime}}-q)y^{(l)}y^{(l^{\prime})}f)-\lambda tn\sum_{l=1}^{n}\nu_{s}(R^{-}_{l,n+1}-q)y^{(l)}y^{(n+1)}f)
+λtnl=1nνs((Rl,q)y(l)yf)λtnνs((Rn+1,q)y(n+1)yf)𝜆𝑡𝑛superscriptsubscript𝑙1𝑛subscript𝜈𝑠subscriptsuperscript𝑅𝑙𝑞superscript𝑦𝑙superscript𝑦𝑓𝜆𝑡𝑛subscript𝜈𝑠subscriptsuperscript𝑅𝑛1𝑞superscript𝑦𝑛1superscript𝑦𝑓\displaystyle~{}~{}+\lambda tn\sum_{l=1}^{n}\nu_{s}((R^{-}_{l,*}-q)y^{(l)}y^{*}f)-\lambda tn\nu_{s}((R^{-}_{n+1,*}-q)y^{(n+1)}y^{*}f)
+λtn(n+1)2νs((Rn+1,n+2q)y(n+1)y(n+2)f),𝜆𝑡𝑛𝑛12subscript𝜈𝑠subscriptsuperscript𝑅𝑛1𝑛2𝑞superscript𝑦𝑛1superscript𝑦𝑛2𝑓\displaystyle~{}~{}+\lambda t\frac{n(n+1)}{2}\nu_{s}((R^{-}_{n+1,n+2}-q)y^{(n+1)}y^{(n+2)}f),

where we have written y=xN𝑦subscript𝑥𝑁y=x_{N}.

Proof.

The computation relies on Gaussian integration by parts. See (Talagrand, 2011a, , Lemma 1.6.3) for the details of a similar computation. \blacksquare

Lemma 16.

If f𝑓f is a bounded non-negative function, then for all s[0,1]𝑠01s\in[0,1],

νs(f)K(λ,n)ν(f).subscript𝜈𝑠𝑓𝐾𝜆𝑛𝜈𝑓\nu_{s}(f)\leq K(\lambda,n)\nu(f).
Proof.

Since the variables and the overlaps are all bounded, and t1𝑡1t\leq 1, using Lemma 15 we have for all s[0,1]𝑠01s\in[0,1]

|νs(f)|K(λ,n)νs(f).superscriptsubscript𝜈𝑠𝑓𝐾𝜆𝑛subscript𝜈𝑠𝑓|\nu_{s}^{\prime}(f)|\leq K(\lambda,n)\nu_{s}(f).

Then we conclude using Grönwall’s lemma. \blacksquare

Lemma 17.

For all s[0,1]𝑠01s\in[0,1], and all τ1,τ2>0subscript𝜏1subscript𝜏20\tau_{1},\tau_{2}>0 such that 1/τ1+1/τ2=11subscript𝜏11subscript𝜏211/\tau_{1}+1/\tau_{2}=1,

|νs(f)ν0(f)|subscript𝜈𝑠𝑓subscript𝜈0𝑓\displaystyle\left|\nu_{s}(f)-\nu_{0}(f)\right| K(λ,n)ν(|R1,q|τ1)1/τ1ν(|f|τ2)1/τ2absent𝐾𝜆𝑛𝜈superscriptsuperscriptsubscriptsuperscript𝑅1𝑞subscript𝜏11subscript𝜏1𝜈superscriptsuperscript𝑓subscript𝜏21subscript𝜏2\displaystyle\leq K(\lambda,n)\nu\left(\left|R^{-}_{1,*}-q\right|^{\tau_{1}}\right)^{1/\tau_{1}}\cdot\nu\left(|f|^{\tau_{2}}\right)^{1/\tau_{2}} (38)
|νs(f)ν0(f)ν0(f)|subscript𝜈𝑠𝑓subscript𝜈0𝑓superscriptsubscript𝜈0𝑓\displaystyle\left|\nu_{s}(f)-\nu_{0}(f)-\nu_{0}^{\prime}(f)\right| K(λ,n)ν(|R1,q|τ1)1/τ1ν(|f|τ2)1/τ2.absent𝐾𝜆𝑛𝜈superscriptsuperscriptsubscriptsuperscript𝑅1𝑞subscript𝜏11subscript𝜏1𝜈superscriptsuperscript𝑓subscript𝜏21subscript𝜏2\displaystyle\leq K(\lambda,n)\nu\left(\left|R^{-}_{1,*}-q\right|^{\tau_{1}}\right)^{1/\tau_{1}}\cdot\nu\left(|f|^{\tau_{2}}\right)^{1/\tau_{2}}. (39)
Proof.

We use Taylor’s approximations

|νs(f)ν0(f)|subscript𝜈𝑠𝑓subscript𝜈0𝑓\displaystyle\left|\nu_{s}(f)-\nu_{0}(f)\right| sup0s1|νs(f)|,absentsubscriptsupremum0𝑠1subscriptsuperscript𝜈𝑠𝑓\displaystyle\leq\sup_{0\leq s\leq 1}\left|\nu^{\prime}_{s}(f)\right|,
|νs(f)ν0(f)ν0(f)|subscript𝜈𝑠𝑓subscript𝜈0𝑓superscriptsubscript𝜈0𝑓\displaystyle\left|\nu_{s}(f)-\nu_{0}(f)-\nu_{0}^{\prime}(f)\right| sup0s1|νs′′(f)|,absentsubscriptsupremum0𝑠1subscriptsuperscript𝜈′′𝑠𝑓\displaystyle\leq\sup_{0\leq s\leq 1}\left|\nu^{\prime\prime}_{s}(f)\right|,

then Lemma 15 and the triangle inequality to bound the right hand sides, then Hölder’s inequality to bound each term in the derivative, and then we apply Lemma 16. (To compute the second derivative, one need to use Lemma 15 recursively.) \blacksquare

5.2 The cavity matrix

Recall the parameters a(0)𝑎0a(0), a(1)𝑎1a(1) and a(2)𝑎2a(2) from (11):

a(0)=𝔼[x2r2]q2(λ),a(1)=𝔼[x2rxr2]q2(λ),a(2)=𝔼[xr4]q2(λ),formulae-sequence𝑎0𝔼superscriptsubscriptdelimited-⟨⟩superscript𝑥2𝑟2superscript𝑞absent2𝜆formulae-sequence𝑎1𝔼subscriptdelimited-⟨⟩superscript𝑥2𝑟superscriptsubscriptdelimited-⟨⟩𝑥𝑟2superscript𝑞absent2𝜆𝑎2𝔼superscriptsubscriptdelimited-⟨⟩𝑥𝑟4superscript𝑞absent2𝜆\displaystyle a(0)=\operatorname{\mathbb{E}}\left[\langle x^{2}\rangle_{r}^{2}\right]-q^{*2}(\lambda),\quad a(1)=\operatorname{\mathbb{E}}\left[\langle x^{2}\rangle_{r}\langle x\rangle_{r}^{2}\right]-q^{*2}(\lambda),\quad a(2)=\operatorname{\mathbb{E}}\left[\langle x\rangle_{r}^{4}\right]-q^{*2}(\lambda),

where r=λq(λ)𝑟𝜆superscript𝑞𝜆r=\lambda q^{*}(\lambda). Now let

𝑨:=λ[a(0)2a(1)a(2)a(1)a(0)a(1)2a(2)2a(1)+3a(2)a(2)4a(1)6a(2)a(0)6a(1)+6a(2)]\displaystyle\bm{A}:=\lambda\cdot\begin{bmatrix}a(0)&-2a(1)&a(2)\\ a(1)&a(0)-a(1)-2a(2)&-2a(1)+3a(2)\\ a(2)&4a(1)-6a(2)&a(0)-6a(1)+6a(2)\end{bmatrix}\cdot (40)

One can easily check that the transpose of this matrix has two eigenvalues μ1subscript𝜇1\mu_{1} and μ2subscript𝜇2\mu_{2} with expressions

μ1(λ)=λ(a(0)2a(1)+a(2)),μ2(λ)=λ(a(0)3a(1)+2a(2)),subscript𝜇1𝜆absent𝜆𝑎02𝑎1𝑎2subscript𝜇2𝜆absent𝜆𝑎03𝑎12𝑎2\displaystyle\begin{aligned} \mu_{1}(\lambda)&=\lambda(a(0)-2a(1)+a(2)),\\ \mu_{2}(\lambda)&=\lambda(a(0)-3a(1)+2a(2)),\end{aligned} (41)

and associated eigenvectors (1,2,1)121(1,-2,1) and (2,3,2)232(2,-3,2), and of multiplicities two and one respectively. (The first eigenvalue appears in a 2×2222\times 2 Jordan block.) We will need to control the largest eigenvalue of 𝑨superscript𝑨top\bm{A}^{\top}. This matrix is the “planted” analogue of the one displayed in (Talagrand, 2011a, , equ. (1.234)) for the SK model. By Cauchy-Schwarz, μ1μ2=λ(a(1)a(2))0subscript𝜇1subscript𝜇2𝜆𝑎1𝑎20\mu_{1}-\mu_{2}=\lambda(a(1)-a(2))\geq 0. As will be clear from the next subsection, the cavity computations we are about present are only informative when μ1<1subscript𝜇11\mu_{1}<1. Interestingly, this is true for all values of λ𝜆\lambda where the 𝖱𝖲𝖱𝖲\mathsf{RS} formula ϕ𝖱𝖲subscriptitalic-ϕ𝖱𝖲\phi_{\mathsf{RS}} has two derivatives:

Lemma 18.

For all λ𝒜𝜆𝒜\lambda\in\mathcal{A}, μ1(λ)<1subscript𝜇1𝜆1\mu_{1}(\lambda)<1.

Proof.

First, if λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c}, then q(λ)=0superscript𝑞𝜆0q^{*}(\lambda)=0, and μ1(λ)=λ(𝔼Px[X2])2subscript𝜇1𝜆𝜆superscriptsubscript𝔼subscript𝑃xsuperscript𝑋22\mu_{1}(\lambda)=\lambda(\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}])^{2}. By Lemma 1, μ1(λ)<1subscript𝜇1𝜆1\mu_{1}(\lambda)<1. Now we assume λ𝒜(λc,+)𝜆𝒜subscript𝜆𝑐\lambda\in\mathcal{A}\cap(\lambda_{c},+\infty). Recall

ψ(r)=𝔼x,zlogexp(rzx+rxxr2x2)dPx(x),𝜓𝑟subscript𝔼superscript𝑥𝑧𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝑃x𝑥\psi(r)=\operatorname{\mathbb{E}}_{x^{*},z}\log\int\exp\left(\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}\right)\mathrm{d}P_{\textup{{x}}}(x),

and the 𝖱𝖲𝖱𝖲\mathsf{RS} potential

F(λ,q)=ψ(λq)λq24.𝐹𝜆𝑞𝜓𝜆𝑞𝜆superscript𝑞24F(\lambda,q)=\psi(\lambda q)-\frac{\lambda q^{2}}{4}.

It is a straightforward exercise to compute the first and the second derivatives of ψ𝜓\psi using Gaussian integration by parts and the Nishimori property:

ψ(r)=12𝔼xxr,\psi^{\prime}(r)=\frac{1}{2}\operatorname{\mathbb{E}}\langle xx^{*}\rangle_{r},
ψ′′(r)=12(𝔼x2x2r2𝔼x(1)2x(2)xr+𝔼x(1)x(2)x(3)xr).\psi^{\prime\prime}(r)=\frac{1}{2}\left(\operatorname{\mathbb{E}}\langle x^{2}x^{*2}\rangle_{r}-2\operatorname{\mathbb{E}}\langle{x^{(1)}}^{2}x^{(2)}x^{*}\rangle_{r}+\operatorname{\mathbb{E}}\langle x^{(1)}x^{(2)}x^{(3)}x^{*}\rangle_{r}\right).

With the choice r=λq(λ)𝑟𝜆superscript𝑞𝜆r=\lambda q^{*}(\lambda), we see that μ1(λ)=2λψ′′(r)subscript𝜇1𝜆2𝜆superscript𝜓′′𝑟\mu_{1}(\lambda)=2\lambda\psi^{\prime\prime}(r). Now we observe that

2Fq2(λ,q)=λ2(2λψ′′(λq)1).superscript2𝐹superscript𝑞2𝜆𝑞𝜆22𝜆superscript𝜓′′𝜆𝑞1\frac{\partial^{2}F}{\partial q^{2}}(\lambda,q)=\frac{\lambda}{2}(2\lambda\psi^{\prime\prime}(\lambda q)-1).

Since q(λ)superscript𝑞𝜆q^{*}(\lambda) is a maximizer of the smooth function F(λ,)𝐹𝜆F(\lambda,\cdot), and lies in the interior of its domain (q(λ)>0superscript𝑞𝜆0q^{*}(\lambda)>0 for λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}), then it must be a first-order stationary point: Fq(λ,q)=0𝐹𝑞𝜆superscript𝑞0\frac{\partial F}{\partial q}(\lambda,q^{*})=0. Hence 2Fq2(λ,q)0superscript2𝐹superscript𝑞2𝜆superscript𝑞0\frac{\partial^{2}F}{\partial q^{2}}(\lambda,q^{*})\leq 0, i.e., μ1(λ)1subscript𝜇1𝜆1\mu_{1}(\lambda)\leq 1 for all λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}. Now we claim that the inequality must be strict for λ𝒜𝜆𝒜\lambda\in\mathcal{A}. Indeed, (Lelarge and Miolane, , 2016, Proposition 15) show that whenever ϕ𝖱𝖲subscriptitalic-ϕ𝖱𝖲\phi_{\mathsf{RS}} is differentiable at λ𝜆\lambda, then the maximizer of F(λ,)𝐹𝜆F(\lambda,\cdot) is unique and

ϕ𝖱𝖲(λ)=q2(λ)4.superscriptsubscriptitalic-ϕ𝖱𝖲𝜆superscript𝑞absent2𝜆4\phi_{\mathsf{RS}}^{\prime}(\lambda)=\frac{q^{*2}(\lambda)}{4}.

Therefore, twice differentiability of ϕ𝖱𝖲subscriptitalic-ϕ𝖱𝖲\phi_{\mathsf{RS}} implies first differentiability of λq(λ)maps-to𝜆superscript𝑞𝜆\lambda\mapsto q^{*}(\lambda) whenever q(λ)>0superscript𝑞𝜆0q^{*}(\lambda)>0 (i.e., λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c}). Now we take advantage of first-order optimality: Fq(λ,q)=0𝐹𝑞𝜆superscript𝑞0\frac{\partial F}{\partial q}(\lambda,q^{*})=0 is the same as

ψ(λq(λ))=q(λ)2.superscript𝜓𝜆superscript𝑞𝜆superscript𝑞𝜆2\psi^{\prime}(\lambda q^{*}(\lambda))=\frac{q^{*}(\lambda)}{2}.

The above can be seen as an equality of functions (of λ𝜆\lambda) defined almost everywhere. Taking one derivative yields

q(λ)ψ′′(λq(λ))=12(12λψ′′(λq(λ)))q(λ).superscript𝑞𝜆superscript𝜓′′𝜆superscript𝑞𝜆1212𝜆superscript𝜓′′𝜆superscript𝑞𝜆superscript𝑞superscript𝜆q^{*}(\lambda)\psi^{\prime\prime}(\lambda q^{*}(\lambda))=\frac{1}{2}\big{(}1-2\lambda\psi^{\prime\prime}(\lambda q^{*}(\lambda))\big{)}q^{*^{\prime}}(\lambda).

Since q(λ)superscript𝑞𝜆q^{*}(\lambda) and ψ′′(λq(λ))superscript𝜓′′𝜆superscript𝑞𝜆\psi^{\prime\prime}(\lambda q^{*}(\lambda)) are both positive, the right-hand side cannot vanish. This concludes the proof. \blacksquare

5.3 Cavity computations for the second moment

In this subsection we prove the convergence of the second moment of the overlaps:

ν((R1,q)2)KN+Kec(t)N,𝜈superscriptsubscript𝑅1superscript𝑞2𝐾𝑁𝐾superscript𝑒𝑐𝑡𝑁\nu((R_{1,*}-q^{*})^{2})\leq\frac{K}{N}+Ke^{-c(t)N},

with c(t)c0(1t)2similar-to𝑐𝑡subscript𝑐0superscript1𝑡2c(t)\sim c_{0}(1-t)^{2} as t1𝑡1t\to 1 when λ>λc𝜆subscript𝜆𝑐\lambda>\lambda_{c} and Pxsubscript𝑃xP_{\textup{{x}}} is symmetric about the origin, and uniformly lower-bounded by a positive constant otherwise. To lighten the notation in the calculations to come, q(λ)superscript𝑞𝜆q^{*}(\lambda) will be denoted simply by q𝑞q, and we recall the notation ν()=𝔼t,1\nu(\cdot)=\operatorname{\mathbb{E}}\langle\cdot\rangle_{t,1}. Let

A=ν((R1,q)2),B=ν((R1,q)(R2,q)),C=ν((R1,q)(R2,3q)).formulae-sequence𝐴𝜈superscriptsubscript𝑅1𝑞2formulae-sequence𝐵𝜈subscript𝑅1𝑞subscript𝑅2𝑞𝐶𝜈subscript𝑅1𝑞subscript𝑅23𝑞\displaystyle A=\nu\left((R_{1,*}-q)^{2}\right),~{}~{}~{}B=\nu\left((R_{1,*}-q)(R_{2,*}-q)\right),~{}~{}~{}C=\nu\left((R_{1,*}-q)(R_{2,3}-q)\right).

By symmetry between sites,

A=ν((R1,q)(xNxNq))=1Nν(xNxN(xNxNq))+ν((R1,q)(xNxNq)).𝐴𝜈subscript𝑅1𝑞subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞1𝑁𝜈subscript𝑥𝑁superscriptsubscript𝑥𝑁subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞𝜈subscriptsuperscript𝑅1𝑞subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞A=\nu\left((R_{1,*}-q)(x_{N}x_{N}^{*}-q)\right)=\frac{1}{N}\nu\left(x_{N}x_{N}^{*}(x_{N}x_{N}^{*}-q)\right)+\nu((R^{-}_{1,*}-q)(x_{N}x_{N}^{*}-q)).

By the first bound (38) of Lemma 17 with τ1=1subscript𝜏11\tau_{1}=1, τ2=subscript𝜏2\tau_{2}=\infty, we get

ν(xNxN(xNxNq))=ν0(xNxN(xNxNq))+δ=a(0)+δ,𝜈subscript𝑥𝑁superscriptsubscript𝑥𝑁subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞subscript𝜈0subscript𝑥𝑁superscriptsubscript𝑥𝑁subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞𝛿𝑎0𝛿\nu(x_{N}x_{N}^{*}(x_{N}x_{N}^{*}-q))=\nu_{0}(x_{N}x_{N}^{*}(x_{N}x_{N}^{*}-q))+\delta=a(0)+\delta,

with |δ|K(λ)ν(|R1,q|)𝛿𝐾𝜆𝜈subscriptsuperscript𝑅1𝑞|\delta|\leq K(\lambda)\nu(|R^{-}_{1,*}-q|). On the other hand, by the second bound (39) with τ1=1subscript𝜏11\tau_{1}=1, τ2=subscript𝜏2\tau_{2}=\infty, we get

ν((R1,q)(xNxNq))=ν0((R1,q)(xNxNq))+δ.𝜈subscriptsuperscript𝑅1𝑞subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞superscriptsubscript𝜈0subscriptsuperscript𝑅1𝑞subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞𝛿\nu((R^{-}_{1,*}-q)(x_{N}x_{N}^{*}-q))=\nu_{0}^{\prime}((R^{-}_{1,*}-q)(x_{N}x_{N}^{*}-q))+\delta.

This is because ν0((R1,q)(xNxNq))=0subscript𝜈0subscriptsuperscript𝑅1𝑞subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞0\nu_{0}((R^{-}_{1,*}-q)(x_{N}x_{N}^{*}-q))=0, since last variable xNsubscript𝑥𝑁x_{N} decouples from the remaining N1𝑁1N-1 variables under the measure ν0subscript𝜈0\nu_{0}. Now, we use Lemma 15 with n=1𝑛1n=1, to evaluate the above derivative at t=0𝑡0t=0. We still write y(l)=xN(l)superscript𝑦𝑙superscriptsubscript𝑥𝑁𝑙y^{(l)}=x_{N}^{(l)}.

ν0((R1,q)(xNxNq))superscriptsubscript𝜈0subscriptsuperscript𝑅1𝑞subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞\displaystyle\nu_{0}^{\prime}((R^{-}_{1,*}-q)(x_{N}x_{N}^{*}-q)) =λtν0(y(1)y(2)(y(1)yq)(R1,q)(R1,2q))absent𝜆𝑡subscript𝜈0superscript𝑦1superscript𝑦2superscript𝑦1superscript𝑦𝑞subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅12𝑞\displaystyle=-\lambda t\nu_{0}(y^{(1)}y^{(2)}(y^{(1)}y^{*}-q)(R^{-}_{1,*}-q)(R^{-}_{1,2}-q))
+λtν0(y(1)y(y(1)yq)(R1,q)2)𝜆𝑡subscript𝜈0superscript𝑦1superscript𝑦superscript𝑦1superscript𝑦𝑞superscriptsubscriptsuperscript𝑅1𝑞2\displaystyle~{}~{}+\lambda t\nu_{0}(y^{(1)}y^{*}(y^{(1)}y^{*}-q)(R^{-}_{1,*}-q)^{2})
λtν0(y(2)y(y(1)yq)(R1,q)(R2,q))𝜆𝑡subscript𝜈0superscript𝑦2superscript𝑦superscript𝑦1superscript𝑦𝑞subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅2𝑞\displaystyle~{}~{}-\lambda t\nu_{0}(y^{(2)}y^{*}(y^{(1)}y^{*}-q)(R^{-}_{1,*}-q)(R^{-}_{2,*}-q))
+λtν0(y(2)y(3)(y(1)yq)(R1,q)(R2,3q)).𝜆𝑡subscript𝜈0superscript𝑦2superscript𝑦3superscript𝑦1superscript𝑦𝑞subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅23𝑞\displaystyle~{}~{}+\lambda t\nu_{0}(y^{(2)}y^{(3)}(y^{(1)}y^{*}-q)(R^{-}_{1,*}-q)(R^{-}_{2,3}-q)).

We extract the average on the y𝑦y-variables from the rest of the expression as pre-factors, so that the above is equal to

λta(1)ν0((R1,q)(R1,2q))+λta(0)ν0((R1,q)2)𝜆𝑡𝑎1subscript𝜈0subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅12𝑞𝜆𝑡𝑎0subscript𝜈0superscriptsubscriptsuperscript𝑅1𝑞2\displaystyle-\lambda ta(1)\nu_{0}((R^{-}_{1,*}-q)(R^{-}_{1,2}-q))+\lambda ta(0)\nu_{0}((R^{-}_{1,*}-q)^{2})
λta(1)ν0((R1,q)(R2,q))+λta(2)ν0((R1,q)(R2,3q)).𝜆𝑡𝑎1subscript𝜈0subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅2𝑞𝜆𝑡𝑎2subscript𝜈0subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅23𝑞\displaystyle-\lambda ta(1)\nu_{0}((R^{-}_{1,*}-q)(R^{-}_{2,*}-q))+\lambda ta(2)\nu_{0}((R^{-}_{1,*}-q)(R^{-}_{2,3}-q)).

We notice that by the Nishimori property that

ν0((R1,q)(R1,2q))=ν0((R1,q)(R2,q)).subscript𝜈0subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅12𝑞subscript𝜈0subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅2𝑞\nu_{0}((R^{-}_{1,*}-q)(R^{-}_{1,2}-q))=\nu_{0}((R^{-}_{1,*}-q)(R^{-}_{2,*}-q)).

Now we observe that ν0((R1,q)(xNxNq))superscriptsubscript𝜈0subscriptsuperscript𝑅1𝑞subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞\nu_{0}^{\prime}((R^{-}_{1,*}-q)(x_{N}x_{N}^{*}-q)) is a linear combination of terms that resemble, but are not quite equal to A𝐴A, B𝐵B and C𝐶C. We are nevertheless tempted to make the substitution since we expect them to be close. We use Lemma 17 to justify this. Taking ν0((R1,q)2)subscript𝜈0superscriptsubscriptsuperscript𝑅1𝑞2\nu_{0}((R^{-}_{1,*}-q)^{2}) as an example, we apply the estimate (38) with t=1𝑡1t=1, τ1=3subscript𝜏13\tau_{1}=3 and τ2=3/2subscript𝜏232\tau_{2}=3/2. We get

ν0((R1,q)2)=ν((R1,q)2)+δsubscript𝜈0superscriptsubscriptsuperscript𝑅1𝑞2𝜈superscriptsubscriptsuperscript𝑅1𝑞2𝛿\nu_{0}((R^{-}_{1,*}-q)^{2})=\nu((R^{-}_{1,*}-q)^{2})+\delta

with |δ|K(λ)ν(|R1,q|3)𝛿𝐾𝜆𝜈superscriptsubscriptsuperscript𝑅1𝑞3|\delta|\leq K(\lambda)\nu(|R^{-}_{1,*}-q|^{3}). Moreover,

ν((R1,q)2)=ν((R1,1Nyyq)2)=ν((R1,q)2)2Nν(yy(R1,q))+1N2ν(y2y2).𝜈superscriptsubscriptsuperscript𝑅1𝑞2𝜈superscriptsubscript𝑅11𝑁𝑦superscript𝑦𝑞2𝜈superscriptsubscript𝑅1𝑞22𝑁𝜈𝑦superscript𝑦subscript𝑅1𝑞1superscript𝑁2𝜈superscript𝑦2superscriptsuperscript𝑦2\nu((R^{-}_{1,*}-q)^{2})=\nu((R_{1,*}-\frac{1}{N}yy^{*}-q)^{2})=\nu((R_{1,*}-q)^{2})-\frac{2}{N}\nu(yy^{*}(R_{1,*}-q))+\frac{1}{N^{2}}\nu({y}^{2}{y^{*}}^{2}).

The third term is of order 1/N21superscript𝑁21/N^{2}, and the second term is bounded by 1Nν0(|R1,q|)1𝑁subscript𝜈0subscript𝑅1𝑞\frac{1}{N}\nu_{0}(\left|R_{1,*}-q\right|). Therefore

ν0((R1,q)2)=ν((R1,q)2)+δ,subscript𝜈0superscriptsubscriptsuperscript𝑅1𝑞2𝜈superscriptsubscript𝑅1𝑞2superscript𝛿\nu_{0}((R^{-}_{1,*}-q)^{2})=\nu((R_{1,*}-q)^{2})+\delta^{\prime},

with

|δ|K(λ)(1Nν(|R1,q|)+ν(|R1,q|3)+1N2).superscript𝛿𝐾𝜆1𝑁𝜈subscriptsuperscript𝑅1𝑞𝜈superscriptsubscriptsuperscript𝑅1𝑞31superscript𝑁2|\delta^{\prime}|\leq K(\lambda)\left(\frac{1}{N}\nu(|R^{-}_{1,*}-q|)+\nu(|R^{-}_{1,*}-q|^{3})+\frac{1}{N^{2}}\right).

This argument applies equally to the remaining terms ν0((R1,q)(R2,q))subscript𝜈0subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅2𝑞\nu_{0}((R^{-}_{1,*}-q)(R^{-}_{2,*}-q)) and ν0((R1,q)(R2,3q))subscript𝜈0subscriptsuperscript𝑅1𝑞subscriptsuperscript𝑅23𝑞\nu_{0}((R^{-}_{1,*}-q)(R^{-}_{2,3}-q)). We then end up with the identity

A=a(0)N+λa(0)A2λa(1)B+λa(2)C+δ(0),𝐴𝑎0𝑁superscript𝜆𝑎0𝐴2superscript𝜆𝑎1𝐵superscript𝜆𝑎2𝐶𝛿0A=\frac{a(0)}{N}+\lambda^{\prime}a(0)A-2\lambda^{\prime}a(1)B+\lambda^{\prime}a(2)C+\delta(0), (42)

where λ=tλsuperscript𝜆𝑡𝜆\lambda^{\prime}=t\lambda, and |δ(0)|𝛿0|\delta(0)| is bounded by the same quantity as |δ|superscript𝛿|\delta^{\prime}|.

Next, we apply the same reasoning to B𝐵B and C𝐶C as well, (e.g., Lemma 15 needs to applied with n=2𝑛2n=2 for B𝐵B and n=3𝑛3n=3 for C𝐶C) we get

B𝐵\displaystyle B =a(1)N+λa(1)A+λ(a(0)a(1)2a(2))B+λ(2a(1)+3a(2))C+δ(1),absent𝑎1𝑁superscript𝜆𝑎1𝐴superscript𝜆𝑎0𝑎12𝑎2𝐵superscript𝜆2𝑎13𝑎2𝐶𝛿1\displaystyle=\frac{a(1)}{N}+\lambda^{\prime}a(1)A+\lambda^{\prime}(a(0)-a(1)-2a(2))B+\lambda^{\prime}(-2a(1)+3a(2))C+\delta(1), (43)
C𝐶\displaystyle C =a(2)N+λa(2)A+λ(4a(1)6a(2))B+λ(a(0)6a(1)+6a(2))C+δ(2),absent𝑎2𝑁superscript𝜆𝑎2𝐴superscript𝜆4𝑎16𝑎2𝐵superscript𝜆𝑎06𝑎16𝑎2𝐶𝛿2\displaystyle=\frac{a(2)}{N}+\lambda^{\prime}a(2)A+\lambda^{\prime}(4a(1)-6a(2))B+\lambda^{\prime}(a(0)-6a(1)+6a(2))C+\delta(2), (44)

where for i=0,1,2𝑖012i=0,1,2,

|δ(i)|K(λ)(1Nν(|R1,q|)+ν(|R1,q|3)+1N2).𝛿𝑖𝐾𝜆1𝑁𝜈subscriptsuperscript𝑅1𝑞𝜈superscriptsubscriptsuperscript𝑅1𝑞31superscript𝑁2|\delta(i)|\leq K(\lambda)\left(\frac{1}{N}\nu(|R^{-}_{1,*}-q|)+\nu(|R^{-}_{1,*}-q|^{3})+\frac{1}{N^{2}}\right). (45)

We have ended up with a linear system in the quantities A𝐴A, B𝐵B and C𝐶C. Let 𝒛=[A,B,C]𝒛superscript𝐴𝐵𝐶top\bm{z}=[A,B,C]^{\top} and 𝜹=[δ(0),δ(1),δ(2)]𝜹superscript𝛿0𝛿1𝛿2top\bm{\delta}=[\delta(0),\delta(1),\delta(2)]^{\top}. Then the equations (42), (43) and (44) can be written as

𝒛=1N𝒂+t𝑨𝒛+𝜹,𝒛1𝑁𝒂𝑡𝑨𝒛𝜹\bm{z}=\frac{1}{N}\bm{a}+t\bm{A}\bm{z}+\bm{\delta}, (46)

where 𝒂=[a(0),a(1),a(2)]𝒂superscript𝑎0𝑎1𝑎2top\bm{a}=[a(0),a(1),a(2)]^{\top}, and the matrix 𝑨𝑨\bm{A} are defined in (40). The above system implies useful bounds on the coefficients of the vector 𝒛𝒛\bm{z} only if the largest eigenvalue of the matrix t𝑨𝑡𝑨t\bm{A} is smaller than 1. This is insured by Lemma 18 when λ𝒜𝜆𝒜\lambda\in\mathcal{A} (independently of t𝑡t). Now we can invert the linear system and extract 𝒛𝒛\bm{z}:

𝒛=1N(𝑰t𝑨)1𝒂+(𝑰t𝑨)1𝜹.𝒛1𝑁superscript𝑰𝑡𝑨1𝒂superscript𝑰𝑡𝑨1𝜹\bm{z}=\frac{1}{N}(\bm{I}-t\bm{A})^{-1}\bm{a}+(\bm{I}-t\bm{A})^{-1}\bm{\delta}. (47)

Now we need to control the entries of 𝜹𝜹\bm{\delta}. By elementary manipulations,

ν(|R1,q|)ν(|R1,q|)+KN,𝜈superscriptsubscript𝑅1𝑞𝜈subscript𝑅1𝑞𝐾𝑁\nu(|R_{1,*}^{-}-q|)\leq\nu(|R_{1,*}-q|)+\frac{K}{N},

and

ν(|R1,q|3)ν(|R1,q|3)+KNν((R1,q)2)+KN2ν(|R1,q|)+KN3.𝜈superscriptsuperscriptsubscript𝑅1𝑞3𝜈superscriptsubscript𝑅1𝑞3𝐾𝑁𝜈superscriptsubscript𝑅1𝑞2𝐾superscript𝑁2𝜈subscript𝑅1𝑞𝐾superscript𝑁3\nu(|R_{1,*}^{-}-q|^{3})\leq\nu(|R_{1,*}-q|^{3})+\frac{K}{N}\nu((R_{1,*}-q)^{2})+\frac{K}{N^{2}}\nu(|R_{1,*}-q|)+\frac{K}{N^{3}}.

Therefore, from (45) we have for all i=0,1,2𝑖012i=0,1,2,

|δ(i)|K(ν(|R1,q|3)+1Nν((R1,q)2)+1Nν(|R1,q|)+1N2).𝛿𝑖𝐾𝜈superscriptsubscript𝑅1𝑞31𝑁𝜈superscriptsubscript𝑅1𝑞21𝑁𝜈subscript𝑅1𝑞1superscript𝑁2|\delta(i)|\leq K\left(\nu(|R_{1,*}-q|^{3})+\frac{1}{N}\nu((R_{1,*}-q)^{2})+\frac{1}{N}\nu(|R_{1,*}-q|)+\frac{1}{N^{2}}\right). (48)

Now we will argue that ν(|R1,q|)1much-less-than𝜈subscript𝑅1𝑞1\nu(|R_{1,*}-q|)\ll 1 and ν(|R1,q|3)ν((R1,q)2)much-less-than𝜈superscriptsubscript𝑅1𝑞3𝜈superscriptsubscript𝑅1𝑞2\nu(|R_{1,*}-q|^{3})\ll\nu((R_{1,*}-q)^{2}). With Lemma 13 we have for ϵ>0italic-ϵ0\epsilon>0

ν(|R1,q|)ϵ+K(ϵ)ecN,𝜈subscript𝑅1𝑞italic-ϵ𝐾italic-ϵsuperscript𝑒𝑐𝑁\nu(|R_{1,*}-q|)\leq\epsilon+K(\epsilon)e^{-cN},

and

ν(|R1,q|3)ϵν((R1,q)2)+K(ϵ)ecN.𝜈superscriptsubscript𝑅1𝑞3italic-ϵ𝜈superscriptsubscript𝑅1𝑞2𝐾italic-ϵsuperscript𝑒𝑐𝑁\nu(|R_{1,*}-q|^{3})\leq\epsilon\nu((R_{1,*}-q)^{2})+K(\epsilon)e^{-cN}.

Combining the above two bounds with (48), and then injecting in (47), we get

ν((R1,q)2)=z(0)𝜈superscriptsubscript𝑅1𝑞2𝑧0\displaystyle\nu((R_{1,*}-q)^{2})=z(0) z21N(𝑰t𝑨)1𝒂2+(𝑰t𝑨)1op𝜹2absentsubscriptnorm𝑧subscript2subscriptnorm1𝑁superscript𝑰𝑡𝑨1𝒂subscript2subscriptnormsuperscript𝑰𝑡𝑨1opsubscriptnorm𝜹subscript2\displaystyle\leq\left\|z\right\|_{\ell_{2}}\leq\left\|\frac{1}{N}(\bm{I}-t\bm{A})^{-1}\bm{a}\right\|_{\ell_{2}}+\left\|(\bm{I}-t\bm{A})^{-1}\right\|_{\text{op}}\left\|\bm{\delta}\right\|_{\ell_{2}}
𝒄2N+K(ϵ+1N)ν((R1,q)2)+K(ϵ)ecN.absentsubscriptnorm𝒄subscript2𝑁𝐾italic-ϵ1𝑁𝜈superscriptsubscript𝑅1𝑞2𝐾italic-ϵsuperscript𝑒𝑐𝑁\displaystyle\leq\frac{\left\|\bm{c}\right\|_{\ell_{2}}}{N}+K(\epsilon+\frac{1}{N})\nu((R_{1,*}-q)^{2})+K(\epsilon)e^{-cN}.

The symbols 2\left\|\cdot\right\|_{\ell_{2}} and op\|\cdot\|_{\text{op}} refer to the 2subscript2\ell_{2} norm of a vector and the matrix operator norm respectively. Here, 𝒄=(𝑰t𝑨)1𝒂𝒄superscript𝑰𝑡𝑨1𝒂\bm{c}=(\bm{I}-t\bm{A})^{-1}\bm{a}. Note that the matrix inverses are bounded even as t1𝑡1t\to 1 since μ1<1subscript𝜇11\mu_{1}<1 for λ𝒜𝜆𝒜\lambda\in\mathcal{A}. We choose ϵitalic-ϵ\epsilon small enough and N𝑁N large enough that K(ϵ+1N)<1𝐾italic-ϵ1𝑁1K(\epsilon+\frac{1}{N})<1. We therefore get

ν((R1,q)2)K(λ)N+K(λ)ec(t)N.𝜈superscriptsubscript𝑅1𝑞2𝐾𝜆𝑁𝐾𝜆superscript𝑒𝑐𝑡𝑁\nu\left((R_{1,*}-q)^{2}\right)\leq\frac{K(\lambda)}{N}+K(\lambda)e^{-c(t)N}.

5.4 Cavity computations for the fourth moment

In this subsection we prove the convergence of the fourth moment:

ν((R1,q)4)KN2+Kec(t)N,𝜈superscriptsubscript𝑅1superscript𝑞4𝐾superscript𝑁2𝐾superscript𝑒𝑐𝑡𝑁\nu((R_{1,*}-q^{*})^{4})\leq\frac{K}{N^{2}}+Ke^{-c(t)N},

where c(t)𝑐𝑡c(t) is of the same type as before. We adopt the same technique based on the cavity method, with the extra knowledge that the second moment converges. Many parts of the argument are exactly the same so we will only highlight the main novelties in the proof. Let

A=ν((R1,q)4),B=ν((R1,q)3(R2,q)),C=ν((R1,q)3(R2,3q)).formulae-sequence𝐴𝜈superscriptsubscript𝑅1𝑞4formulae-sequence𝐵𝜈superscriptsubscript𝑅1𝑞3subscript𝑅2𝑞𝐶𝜈superscriptsubscript𝑅1𝑞3subscript𝑅23𝑞\displaystyle A=\nu\left((R_{1,*}-q)^{4}\right),~{}~{}~{}B=\nu\left((R_{1,*}-q)^{3}(R_{2,*}-q)\right),~{}~{}~{}C=\nu\left((R_{1,*}-q)^{3}(R_{2,3}-q)\right).

By symmetry between sites,

A𝐴\displaystyle A =ν((R1,q)3(xNxNq))absent𝜈superscriptsubscript𝑅1𝑞3subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞\displaystyle=\nu\left((R_{1,*}-q)^{3}(x_{N}x_{N}^{*}-q)\right)
=ν((R1,q)3(xNxNq))+3Nν((R1,q)2xNxN(xNxNq))absent𝜈superscriptsubscriptsuperscript𝑅1𝑞3subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞3𝑁𝜈superscriptsubscriptsuperscript𝑅1𝑞2subscript𝑥𝑁superscriptsubscript𝑥𝑁subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞\displaystyle=\nu((R^{-}_{1,*}-q)^{3}(x_{N}x_{N}^{*}-q))+\frac{3}{N}\nu((R^{-}_{1,*}-q)^{2}x_{N}x_{N}^{*}(x_{N}x_{N}^{*}-q))
+3N2ν((R1,q)xN2xN2(xNxNq))+1N3ν(xN3xN3(xNxNq)).3superscript𝑁2𝜈subscriptsuperscript𝑅1𝑞superscriptsubscript𝑥𝑁2superscriptsuperscriptsubscript𝑥𝑁2subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞1superscript𝑁3𝜈superscriptsubscript𝑥𝑁3superscriptsuperscriptsubscript𝑥𝑁3subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞\displaystyle~{}~{}~{}+\frac{3}{N^{2}}\nu((R^{-}_{1,*}-q){x_{N}}^{2}{x_{N}^{*}}^{2}(x_{N}x_{N}^{*}-q))+\frac{1}{N^{3}}\nu({x_{N}}^{3}{x_{N}^{*}}^{3}(x_{N}x_{N}^{*}-q)).

The quadratic term is bounded as

ν((R1,q)2xNxN(xNxNq))Kν((R1,q)2)KN+KecN.𝜈superscriptsubscriptsuperscript𝑅1𝑞2subscript𝑥𝑁superscriptsubscript𝑥𝑁subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞𝐾𝜈superscriptsubscriptsuperscript𝑅1𝑞2𝐾𝑁𝐾superscript𝑒𝑐𝑁\nu((R^{-}_{1,*}-q)^{2}x_{N}x_{N}^{*}(x_{N}x_{N}^{*}-q))\leq K\nu((R^{-}_{1,*}-q)^{2})\leq\frac{K}{N}+Ke^{-cN}.

The last inequality is using our extra knowledge about the convergence of the second moment. The last two terms are also bounded by K/N2𝐾superscript𝑁2K/N^{2} and K/N3𝐾superscript𝑁3K/N^{3} respectively. Now we must deal with the cubic term, and here, we apply the exact same technique used to deal with the term ν((R1,q)(xNxNq))𝜈subscriptsuperscript𝑅1𝑞subscript𝑥𝑁superscriptsubscript𝑥𝑁𝑞\nu((R^{-}_{1,*}-q)(x_{N}x_{N}^{*}-q)) in the previous proof. The argument goes verbatim. Then we equally treat the terms B𝐵B and C𝐶C. We end up with a similar linear system relating A𝐴A, B𝐵B and C𝐶C:

𝒛=1N2𝒅+t𝑨𝒛+𝜹,𝒛1superscript𝑁2𝒅𝑡𝑨𝒛𝜹\bm{z}=\frac{1}{N^{2}}\bm{d}+t\bm{A}\bm{z}+\bm{\delta},

where 𝒛=[A,B,C]𝒛superscript𝐴𝐵𝐶top\bm{z}=[A,B,C]^{\top}. The differences with the earlier linear system (46) are in the vector of coefficients 𝒅𝒅\bm{d} (that could be determined from the recursions) and the error terms δ(i)𝛿𝑖\delta(i), which are now bounded as

|δ(i)|Kν(|R1,q|5)+Kl=131N3lν(|R1,q|l).𝛿𝑖𝐾𝜈superscriptsubscriptsuperscript𝑅1𝑞5𝐾superscriptsubscript𝑙131superscript𝑁3𝑙𝜈superscriptsubscriptsuperscript𝑅1𝑞𝑙|\delta(i)|\leq K\nu(|R^{-}_{1,*}-q|^{5})+K\sum_{l=1}^{3}\frac{1}{N^{3-l}}\nu(|R^{-}_{1,*}-q|^{l}).

Crucially, the matrix 𝑨𝑨\bm{A} remains the same. Using Lemma 13, we have for ϵ>0italic-ϵ0\epsilon>0,

ν(|R1,q|5)ϵν((R1,q)4)+K(ϵ)ecN,𝜈superscriptsubscript𝑅1𝑞5italic-ϵ𝜈superscriptsubscript𝑅1𝑞4𝐾italic-ϵsuperscript𝑒𝑐𝑁\nu(|R_{1,*}-q|^{5})\leq\epsilon\nu((R_{1,*}-q)^{4})+K(\epsilon)e^{-cN},
ν(|R1,q|3)ϵν((R1,q)2)+K(ϵ)ecN.𝜈superscriptsubscript𝑅1𝑞3italic-ϵ𝜈superscriptsubscript𝑅1𝑞2𝐾italic-ϵsuperscript𝑒𝑐𝑁\nu(|R_{1,*}-q|^{3})\leq\epsilon\nu((R_{1,*}-q)^{2})+K(\epsilon)e^{-cN}.

With the bound we already have on ν((R1,q)2)𝜈superscriptsubscript𝑅1𝑞2\nu((R_{1,*}-q)^{2}), we finish the argument in the same way, by choosing ϵitalic-ϵ\epsilon sufficiently small. This concludes the proof of Theorem 7.

5.5 Sharper results: the asymptotic variance

Finally, given the convergence of the fourth moment, we can refine the convergence result of the second moment. Indeed we are now able to compute the limit of Nν((R1,q)2)𝑁𝜈superscriptsubscript𝑅1𝑞2N\cdot\nu((R_{1,*}-q)^{2}). Using Jensen’s inequality on the second and fourth moment, we have

ν(|R1,q|)KN+Kec(t)N,andν(|R1,q|3)KN3/2+Kec(t)N.formulae-sequence𝜈subscript𝑅1𝑞𝐾𝑁𝐾superscript𝑒𝑐𝑡𝑁and𝜈superscriptsubscript𝑅1𝑞3𝐾superscript𝑁32𝐾superscript𝑒superscript𝑐𝑡𝑁\nu(|R_{1,*}-q|)\leq\frac{K}{\sqrt{N}}+Ke^{-c(t)N},~{}~{}~{}\mbox{and}~{}~{}~{}\nu(|R_{1,*}-q|^{3})\leq\frac{K}{N^{3/2}}+Ke^{-c^{\prime}(t)N}.

Looking back at (48), we can now assert that

|δ(i)|KN3/2+Kec(t)N.𝛿𝑖𝐾superscript𝑁32𝐾superscript𝑒𝑐𝑡𝑁|\delta(i)|\leq\frac{K}{N^{3/2}}+Ke^{-c(t)N}.

We plug this new knowledge in (47), and obtain

N𝒛(𝑰t𝑨)1𝒂2subscriptnorm𝑁𝒛superscript𝑰𝑡𝑨1𝒂subscript2\displaystyle\left\|N\bm{z}-(\bm{I}-t\bm{A})^{-1}\bm{a}\right\|_{\ell_{2}} N(𝑰t𝑨)1op𝜹2K(1N+Nec(t)N).absent𝑁subscriptnormsuperscript𝑰𝑡𝑨1opsubscriptnorm𝜹subscript2𝐾1𝑁𝑁superscript𝑒𝑐𝑡𝑁\displaystyle\leq N\left\|(\bm{I}-t\bm{A})^{-1}\right\|_{\text{op}}\left\|\bm{\delta}\right\|_{\ell_{2}}\leq K\Big{(}\frac{1}{\sqrt{N}}+Ne^{-c(t)N}\Big{)}.

The last line follows since supt(𝑰t𝑨)1opK(λ)subscriptsupremum𝑡subscriptnormsuperscript𝑰𝑡𝑨1op𝐾𝜆\sup_{t}\left\|(\bm{I}-t\bm{A})^{-1}\right\|_{\text{op}}\leq K(\lambda) for λ𝒜𝜆𝒜\lambda\in\mathcal{A}. We have just proved that

ν((R1,q)2)𝜈superscriptsubscript𝑅1𝑞2\displaystyle\nu((R_{1,*}-q)^{2}) =c(0)N+K(λ)(1N3/2+ec(t)N),absent𝑐0𝑁𝐾𝜆1superscript𝑁32superscript𝑒𝑐𝑡𝑁\displaystyle=\frac{c(0)}{N}+K(\lambda)\Big{(}\frac{1}{N^{3/2}}+e^{-c(t)N}\Big{)},
ν((R1,q)(R2,q))𝜈subscript𝑅1𝑞subscript𝑅2𝑞\displaystyle\nu((R_{1,*}-q)(R_{2,*}-q)) =c(1)N+K(λ)(1N3/2+ec(t)N),absent𝑐1𝑁𝐾𝜆1superscript𝑁32superscript𝑒𝑐𝑡𝑁\displaystyle=\frac{c(1)}{N}+K(\lambda)\Big{(}\frac{1}{N^{3/2}}+e^{-c(t)N}\Big{)},
ν((R1,q)(R2,3q))𝜈subscript𝑅1𝑞subscript𝑅23𝑞\displaystyle\nu((R_{1,*}-q)(R_{2,3}-q)) =c(2)N+K(λ)(1N3/2+ec(t)N),absent𝑐2𝑁𝐾𝜆1superscript𝑁32superscript𝑒𝑐𝑡𝑁\displaystyle=\frac{c(2)}{N}+K(\lambda)\Big{(}\frac{1}{N^{3/2}}+e^{-c(t)N}\Big{)},

where 𝒄=(𝑰t𝑨)1𝒂𝒄superscript𝑰𝑡𝑨1𝒂\bm{c}=(\bm{I}-t\bm{A})^{-1}\bm{a}. One can solve this linear system explicitly and obtain the expression of the coordinates of 𝒄𝒄\bm{c}:

c(0)𝑐0\displaystyle c(0) =1λt(1+21tμ2+21tμ1+3+3λta(0)2λta(1)(1tμ1)2),absent1𝜆𝑡121𝑡subscript𝜇221𝑡subscript𝜇133𝜆𝑡𝑎02𝜆𝑡𝑎1superscript1𝑡subscript𝜇12\displaystyle=\frac{1}{\lambda t}\left(-1+\frac{2}{1-t\mu_{2}}+\frac{2}{1-t\mu_{1}}+\frac{-3+3\lambda ta(0)-2\lambda ta(1)}{(1-t\mu_{1})^{2}}\right),
c(1)𝑐1\displaystyle c(1) =1λt(3+3λta(0)2λta(1)(1tμ1)2+31tμ2),absent1𝜆𝑡33𝜆𝑡𝑎02𝜆𝑡𝑎1superscript1𝑡subscript𝜇1231𝑡subscript𝜇2\displaystyle=\frac{1}{\lambda t}\left(\frac{-3+3\lambda ta(0)-2\lambda ta(1)}{(1-t\mu_{1})^{2}}+\frac{3}{1-t\mu_{2}}\right),
c(2)𝑐2\displaystyle c(2) =4λta(1)2+(1λta(0)5λta(1))a(2)+2λta(2)2(1tμ1)2(1tμ2).absent4𝜆𝑡𝑎superscript121𝜆𝑡𝑎05𝜆𝑡𝑎1𝑎22𝜆𝑡𝑎superscript22superscript1𝑡subscript𝜇121𝑡subscript𝜇2\displaystyle=\frac{4\lambda ta(1)^{2}+(1-\lambda ta(0)-5\lambda ta(1))a(2)+2\lambda ta(2)^{2}}{(1-t\mu_{1})^{2}(1-t\mu_{2})}.

The expression of the first coordinate defines Δ𝖱𝖲(λ;t)subscriptΔ𝖱𝖲𝜆𝑡\Delta_{\mathsf{RS}}(\lambda;t), equ.(27). This concludes the proof of Theorem 8.

5.6 Proof of Lemma 9

Let f(x,x)=x2x2𝑓𝑥superscript𝑥superscript𝑥2superscript𝑥absent2f(x,x^{*})=x^{2}x^{*2}. We have ν0(f)=a(0)subscript𝜈0𝑓𝑎0\nu_{0}(f)=a(0). We use the first assertion of Lemma 17 with τ1=1subscript𝜏11\tau_{1}=1 and τ2=subscript𝜏2\tau_{2}=\infty to get

|ν(f)ν0(f)|K(λ)ν(|R1,q|)KN+Kec(t)N,𝜈𝑓subscript𝜈0𝑓𝐾𝜆𝜈superscriptsubscript𝑅1superscript𝑞𝐾𝑁𝐾superscript𝑒𝑐𝑡𝑁\left|\nu(f)-\nu_{0}(f)\right|\leq K(\lambda)\nu(|R_{1,*}^{-}-q^{*}|)\leq\frac{K}{\sqrt{N}}+Ke^{-c(t)N},

where the last bound follows from Theorem 7 and Jensen’s inequality.

6 Proof of Theorem 4

In this section we prove a slightly stronger result than convergence in distribution. We prove the convergence of all moments with an explicit rate of 𝒪(N1/2)𝒪superscript𝑁12\mathcal{O}(N^{-1/2}). Statement (ii)𝑖𝑖(ii) is deduced effortlessly form a classical result while statement (i)𝑖(i) requires more work. We start with the former.

6.1 Fluctuations under 0subscript0\operatorname{\mathbb{P}}_{0}: the ALR CLT

We assume in this subsection that Px=12δ1+12δ+1subscript𝑃x12subscript𝛿112subscript𝛿1P_{\textup{{x}}}=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{+1}, and let 𝒀0similar-to𝒀subscript0\bm{Y}\sim\operatorname{\mathbb{P}}_{0}, i.e, Yij𝒩(0,1)similar-tosubscript𝑌𝑖𝑗𝒩01Y_{ij}\sim\mathcal{N}(0,1) i.i.d. We then see that the likelihood ratio is related to the partition function of the Sherrington–Kirkpatrick model via a trivial relation:

logL(𝒀;λ)𝐿𝒀𝜆\displaystyle\log L(\bm{Y};\lambda) =logexp(λNi<jYijxixjλ2Ni<jxi2xj2)dPxN(𝒙)absent𝜆𝑁subscript𝑖𝑗subscript𝑌𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗𝜆2𝑁subscript𝑖𝑗superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑗2differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙\displaystyle=\log\int\exp\Big{(}\sqrt{\frac{\lambda}{N}}\sum_{i<j}Y_{ij}x_{i}x_{j}-\frac{\lambda}{2N}\sum_{i<j}x_{i}^{2}x_{j}^{2}\Big{)}~{}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})
=log𝝈{±1}Nexp(βNi<jYijσiσj)Nlog2β2(N1)4absentsubscript𝝈superscriptplus-or-minus1𝑁𝛽𝑁subscript𝑖𝑗subscript𝑌𝑖𝑗subscript𝜎𝑖subscript𝜎𝑗𝑁2superscript𝛽2𝑁14\displaystyle=\log\sum_{\bm{\sigma}\in\{\pm 1\}^{N}}\exp\Big{(}\frac{\beta}{\sqrt{N}}\sum_{i<j}Y_{ij}\sigma_{i}\sigma_{j}\Big{)}-N\log 2-\frac{\beta^{2}(N-1)}{4}
=:logZN(β)Nlog2β2(N1)4,\displaystyle=:\log Z_{N}(\beta)-N\log 2-\frac{\beta^{2}(N-1)}{4},

where we have let β=λ𝛽𝜆\beta=\sqrt{\lambda}. ZN(β)subscript𝑍𝑁𝛽Z_{N}(\beta) is the partition function of the SK model at inverse temperature β>0𝛽0\beta>0. It is easy to compute the expectation of ZN(β)subscript𝑍𝑁𝛽Z_{N}(\beta):

log𝔼ZN(β)=Nlog2+β2(N1)4,𝔼subscript𝑍𝑁𝛽𝑁2superscript𝛽2𝑁14\log\operatorname{\mathbb{E}}Z_{N}(\beta)=N\log 2+\frac{\beta^{2}(N-1)}{4},

so that 𝔼logL(𝒀;λ)𝔼𝐿𝒀𝜆\operatorname{\mathbb{E}}\log L(\bm{Y};\lambda) is the gap between the free energy of the SK model and its annealed version. The question of determining the values of the inverse temperature for which this gap is zero (or constant), i.e., at what temperatures is the free energy given by the annealed computation? is a central question in statistical physics. Aizenman, Lebowitz and Ruelle (ALR) proved that in the high-temperature regime β<1𝛽1\beta<1, log(ZN(β)/𝔼ZN(β))subscript𝑍𝑁𝛽𝔼subscript𝑍𝑁𝛽\log(Z_{N}(\beta)/\operatorname{\mathbb{E}}Z_{N}(\beta)) converges in distribution the normal law

𝒩(14(log(1β2)+β2),12(log(1β2)+β2)).𝒩141superscript𝛽2superscript𝛽2121superscript𝛽2superscript𝛽2\mathcal{N}\left(\frac{1}{4}(\log(1-\beta^{2})+\beta^{2}),-\frac{1}{2}(\log(1-\beta^{2})+\beta^{2})\right).

In our notation this simply means

logL(𝒀;λ)𝒩(μ,σ2)𝐿𝒀𝜆𝒩𝜇superscript𝜎2\log L(\bm{Y};\lambda)\rightsquigarrow\mathcal{N}(-\mu,\sigma^{2})

under 0subscript0\operatorname{\mathbb{P}}_{0} where μ=12σ2=14(log(1λ)+λ)𝜇12superscript𝜎2141𝜆𝜆\mu=\frac{1}{2}\sigma^{2}=-\frac{1}{4}(\log(1-\lambda)+\lambda). The ALR proof is combinatorial and uses so-called cluster expansion techniques. It may not extend to other types of priors. Alternative proofs were subsequently found by adopting different perspectives on the problem (Comets and Neveu, , 1995; Guerra and Toninelli, 2002a, ). A more recent proof based on the cavity method is provided by Talagrand in his second book (Talagrand, 2011b, , Section 11.4). His method provides an explicit (and optimal) rate of convergence of the moments of the random variable in question to those of the Gaussian. In what follows we use Talagrand’s approach to prove a similar central limit theorem when 𝒀λsimilar-to𝒀subscript𝜆\bm{Y}\sim\operatorname{\mathbb{P}}_{\lambda}, for an arbitrary bounded prior Pxsubscript𝑃xP_{\textup{{x}}}. In this more general setting, the high temperature region of the model is given by the condition λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c}.

6.2 Fluctuations under λsubscript𝜆\operatorname{\mathbb{P}}_{\lambda}: a planted version of the ALR CLT

Let λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c}, and 𝒀λsimilar-to𝒀subscript𝜆\bm{Y}\sim\operatorname{\mathbb{P}}_{\lambda}. We define the random variable

X(λ)=logL(𝒀)μ(λ),𝑋𝜆𝐿𝒀𝜇𝜆X(\lambda)=\log L(\bm{Y})-\mu(\lambda),

where

μ(λ)=14(log(1λ)λ),andb(λ)=σ2(λ)=2μ(λ).formulae-sequence𝜇𝜆141𝜆𝜆and𝑏𝜆superscript𝜎2𝜆2𝜇𝜆\mu(\lambda)=\frac{1}{4}(-\log(1-\lambda)-\lambda),\quad\mbox{and}\quad b(\lambda)=\sigma^{2}(\lambda)=2\mu(\lambda).

We will prove that the integer moments of X(λ)𝑋𝜆X(\lambda) converge to those of the Gaussian with variance b(λ)𝑏𝜆b(\lambda). This is a sufficient condition for convergence in distribution to hold, since the Gaussian is uniquely determined by its moments.

Theorem 19.

For all λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} and integers k𝑘k, there exists a constant K(λ,k)0𝐾𝜆𝑘0K(\lambda,k)\geq 0 such that

|𝔼[X(λ)k]m(k)b(λ)k/2|K(λ,k)N,𝔼𝑋superscript𝜆𝑘𝑚𝑘𝑏superscript𝜆𝑘2𝐾𝜆𝑘𝑁\left|\operatorname{\mathbb{E}}\left[X(\lambda)^{k}\right]-m(k)b(\lambda)^{k/2}\right|\leq\frac{K(\lambda,k)}{\sqrt{N}},

where m(k)=𝔼[gk]𝑚𝑘𝔼superscript𝑔𝑘m(k)=\operatorname{\mathbb{E}}[g^{k}] is the k𝑘k-th moment of the standard Gaussian g𝒩(0,1)similar-to𝑔𝒩01g\sim\mathcal{N}(0,1).

This theorem mirrors Theorem 11.4.1 in (Talagrand, 2011b, ), and our approach is inspired by his. We define the function

f(λ):=𝔼[X(λ)k].assign𝑓𝜆𝔼𝑋superscript𝜆𝑘f(\lambda):=\operatorname{\mathbb{E}}\left[X(\lambda)^{k}\right].
Lemma 20.

For all λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c},

f(λ)superscript𝑓𝜆\displaystyle f^{\prime}(\lambda) =k4𝔼[(NR1,22xN22)X(λ)k1]+k2𝔼[(NR1,2xN2xN2)X(λ)k1]absent𝑘4𝔼𝑁delimited-⟨⟩superscriptsubscript𝑅122superscriptdelimited-⟨⟩superscriptsubscript𝑥𝑁22𝑋superscript𝜆𝑘1𝑘2𝔼𝑁delimited-⟨⟩superscriptsubscript𝑅12delimited-⟨⟩superscriptsubscript𝑥𝑁2superscriptsubscript𝑥𝑁absent2𝑋superscript𝜆𝑘1\displaystyle=-\frac{k}{4}\operatorname{\mathbb{E}}\left[\left(N\langle R_{1,2}^{2}\rangle-\langle x_{N}^{2}\rangle^{2}\right)X(\lambda)^{k-1}\right]+\frac{k}{2}\operatorname{\mathbb{E}}\left[\left(N\langle R_{1,*}^{2}\rangle-\langle x_{N}^{2}x_{N}^{*2}\rangle\right)X(\lambda)^{k-1}\right]
kμ(λ)𝔼[X(λ)k1]+k(k1)4𝔼[(NR1,22xN22)X(λ)k2].𝑘superscript𝜇𝜆𝔼𝑋superscript𝜆𝑘1𝑘𝑘14𝔼𝑁delimited-⟨⟩superscriptsubscript𝑅122superscriptdelimited-⟨⟩superscriptsubscript𝑥𝑁22𝑋superscript𝜆𝑘2\displaystyle~{}~{}~{}-k\mu^{\prime}(\lambda)\operatorname{\mathbb{E}}\left[X(\lambda)^{k-1}\right]+\frac{k(k-1)}{4}\operatorname{\mathbb{E}}\left[\left(N\langle R_{1,2}^{2}\rangle-\langle x_{N}^{2}\rangle^{2}\right)X(\lambda)^{k-2}\right]. (49)
Proof.

This is by simple differentiation and regrouping of terms. \blacksquare

The derivative involves averages of the form

𝔼[(NR1,l2xN(1)2xN(l)2)X(λ)k],𝔼𝑁delimited-⟨⟩superscriptsubscript𝑅1𝑙2delimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥𝑙𝑁2𝑋superscript𝜆𝑘\operatorname{\mathbb{E}}\left[\left(N\langle R_{1,l}^{2}\rangle-\langle{x^{(1)}_{N}}^{2}{x^{(l)}_{N}}^{2}\rangle\right)X(\lambda)^{k}\right],

for l=2,𝑙2l=2,*. In the first line of (20), we see that the planted term l=𝑙l=* has a pre-factor twice as big the that of the replica term l=2𝑙2l=2. This is the reason the mean of the limiting Gaussian is μ𝜇\mu and not μ𝜇-\mu in the planted case. A crucial step in the argument is to show that X(λ)k𝑋superscript𝜆𝑘X(\lambda)^{k} and its pre-factor in the above expression are asymptotically uncorrelated, so that one can split the expectation:

Proposition 21.

For all λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} and integers k1𝑘1k\geq 1, and l{2,}𝑙2l\in\{2,*\}, we have

𝔼[(NR1,l2xN(1)2xN(l)2)X(λ)k]=λ1λ𝔼[X(λ)k]+δ,𝔼𝑁delimited-⟨⟩superscriptsubscript𝑅1𝑙2delimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥𝑙𝑁2𝑋superscript𝜆𝑘𝜆1𝜆𝔼𝑋superscript𝜆𝑘𝛿\operatorname{\mathbb{E}}\left[\left(N\langle R_{1,l}^{2}\rangle-\langle{x^{(1)}_{N}}^{2}{x^{(l)}_{N}}^{2}\rangle\right)X(\lambda)^{k}\right]=\frac{\lambda}{1-\lambda}\operatorname{\mathbb{E}}\left[X(\lambda)^{k}\right]+\delta,

where |δ|K(k,λ)/N𝛿𝐾𝑘𝜆𝑁|\delta|\leq K(k,\lambda)/\sqrt{N}.

Proof of Theorem 19. Proposition 21 implies

f(λ)=k(14λ1λμ(λ))𝔼[X(λ)k1]+k(k1)4λ1λ𝔼[X(λ)k2]+δ.superscript𝑓𝜆𝑘14𝜆1𝜆superscript𝜇𝜆𝔼𝑋superscript𝜆𝑘1𝑘𝑘14𝜆1𝜆𝔼𝑋superscript𝜆𝑘2𝛿f^{\prime}(\lambda)=k\left(\frac{1}{4}\frac{\lambda}{1-\lambda}-\mu^{\prime}(\lambda)\right)\operatorname{\mathbb{E}}\left[X(\lambda)^{k-1}\right]+\frac{k(k-1)}{4}\frac{\lambda}{1-\lambda}\operatorname{\mathbb{E}}\left[X(\lambda)^{k-2}\right]+\delta.

Notice that with our choice of the function μ𝜇\mu, the first term on the right-hand side vanishes. (Setting this term to zero provides another way of discovering the function μ𝜇\mu.) Now we let b(λ)=2μ(λ)𝑏𝜆2𝜇𝜆b(\lambda)=2\mu(\lambda). We have for all λ𝜆\lambda and all k2𝑘2k\geq 2

ddλ𝔼[X(λ)k]=k(k1)2b(λ)𝔼[X(λ)k2]+δ.dd𝜆𝔼𝑋superscript𝜆𝑘𝑘𝑘12superscript𝑏𝜆𝔼𝑋superscript𝜆𝑘2𝛿\frac{\mathrm{d}}{\mathrm{d}\lambda}\operatorname{\mathbb{E}}\left[X(\lambda)^{k}\right]=\frac{k(k-1)}{2}b^{\prime}(\lambda)\operatorname{\mathbb{E}}\left[X(\lambda)^{k-2}\right]+\delta. (50)

By induction, and since X(0)=0𝑋00X(0)=0, we see that for all even k𝑘k

𝔼[X(λ)k]=m(k)b(λ)k/2+𝒪(K(k,λ)N),𝔼𝑋superscript𝜆𝑘𝑚𝑘𝑏superscript𝜆𝑘2𝒪𝐾𝑘𝜆𝑁\operatorname{\mathbb{E}}\left[X(\lambda)^{k}\right]=m(k)b(\lambda)^{k/2}+\mathcal{O}\left(\frac{K(k,\lambda)}{\sqrt{N}}\right),

where m(k)=(k1)m(k2)𝑚𝑘𝑘1𝑚𝑘2m(k)=(k-1)m(k-2) and m(0)=1𝑚01m(0)=1. The last recursion defines the sequence of even Gaussian moments. As for odd values of k𝑘k, we have already proved in Corollary 3 that

|𝔼[X(λ)]|K(λ)N.𝔼𝑋𝜆𝐾𝜆𝑁\left|\operatorname{\mathbb{E}}\left[X(\lambda)\right]\right|\leq\frac{K(\lambda)}{\sqrt{N}}.

We use induction again on (50) to conclude that for all odd k𝑘k,

|𝔼[X(λ)k]|K(k,λ)N.𝔼𝑋superscript𝜆𝑘𝐾𝑘𝜆𝑁\left|\operatorname{\mathbb{E}}\left[X(\lambda)^{k}\right]\right|\leq\frac{K(k,\lambda)}{\sqrt{N}}.

 

6.3 Proof of Proposition 21

The argument is in two stages. We first prove that

N𝔼[R1,l2X(λ)k]=11λ𝔼[xN(1)2xN(l)2X(λ)k]+δ,𝑁𝔼delimited-⟨⟩superscriptsubscript𝑅1𝑙2𝑋superscript𝜆𝑘11𝜆𝔼delimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥𝑙𝑁2𝑋superscript𝜆𝑘𝛿N\cdot\operatorname{\mathbb{E}}\left[\langle R_{1,l}^{2}\rangle X(\lambda)^{k}\right]=\frac{1}{1-\lambda}\operatorname{\mathbb{E}}\left[\langle{x^{(1)}_{N}}^{2}{x^{(l)}_{N}}^{2}\rangle X(\lambda)^{k}\right]+\delta, (51)

and then

𝔼[xN(1)2xN(l)2X(λ)k]=𝔼[X(λ)k]+δ,𝔼delimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥𝑙𝑁2𝑋superscript𝜆𝑘𝔼𝑋superscript𝜆𝑘𝛿\operatorname{\mathbb{E}}\left[\langle{x^{(1)}_{N}}^{2}{x^{(l)}_{N}}^{2}\rangle X(\lambda)^{k}\right]=\operatorname{\mathbb{E}}\left[X(\lambda)^{k}\right]+\delta, (52)

where in both cases |δ|K(k,λ)/N𝛿𝐾𝑘𝜆𝑁|\delta|\leq K(k,\lambda)/\sqrt{N}. We once again use the cavity method to extract the last variable xNsubscript𝑥𝑁x_{N} and analyze its influence. Let

Ht(𝒙)subscript𝐻𝑡𝒙\displaystyle-H_{t}(\bm{x}) :=1i<jN1λ2Nxi2xj2+λNWijxixj+λNxixixjxjassignabsentsubscript1𝑖𝑗𝑁1𝜆2𝑁superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑗2𝜆𝑁subscript𝑊𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗𝜆𝑁subscript𝑥𝑖superscriptsubscript𝑥𝑖subscript𝑥𝑗superscriptsubscript𝑥𝑗\displaystyle:=\sum_{1\leq i<j\leq N-1}-\frac{\lambda}{2N}x_{i}^{2}x_{j}^{2}+\sqrt{\frac{\lambda}{N}}W_{ij}x_{i}x_{j}+\frac{\lambda}{N}x_{i}x_{i}^{*}x_{j}x_{j}^{*}
+i=1N1tλ2Nxi2xN2+tλNWiNxixN+tλNxixixNxN,superscriptsubscript𝑖1𝑁1𝑡𝜆2𝑁superscriptsubscript𝑥𝑖2superscriptsubscript𝑥𝑁2𝑡𝜆𝑁subscript𝑊𝑖𝑁subscript𝑥𝑖subscript𝑥𝑁𝑡𝜆𝑁subscript𝑥𝑖superscriptsubscript𝑥𝑖subscript𝑥𝑁superscriptsubscript𝑥𝑁\displaystyle~{}~{}~{}+\sum_{i=1}^{N-1}-\frac{t\lambda}{2N}x_{i}^{2}x_{N}^{2}+\sqrt{\frac{t\lambda}{N}}W_{iN}x_{i}x_{N}+\frac{t\lambda}{N}x_{i}x_{i}^{*}x_{N}x_{N}^{*},

and

Y(t):=logeHt(𝒙)dPxN(𝒙)μ(λ).assign𝑌𝑡superscript𝑒subscript𝐻𝑡𝒙differential-dsuperscriptsubscript𝑃xtensor-productabsent𝑁𝒙𝜇𝜆Y(t):=\log\int e^{-H_{t}(\bm{x})}\mathrm{d}P_{\textup{{x}}}^{\otimes N}(\bm{x})-\mu(\lambda).

We have Y(1)=X(λ)𝑌1𝑋𝜆Y(1)=X(\lambda). We consider the quantity

φ(t;l):=𝔼[(NR1,l2txN(1)2xN(l)2t)Y(t)k].assign𝜑𝑡𝑙𝔼𝑁subscriptdelimited-⟨⟩superscriptsubscript𝑅1𝑙2𝑡subscriptdelimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥𝑙𝑁2𝑡𝑌superscript𝑡𝑘\varphi(t;l):=\operatorname{\mathbb{E}}\left[\left(N\langle R_{1,l}^{2}\rangle_{t}-\langle{x^{(1)}_{N}}^{2}{x^{(l)}_{N}}^{2}\rangle_{t}\right)Y(t)^{k}\right].

Our strategy is approximate φ(t;l)𝜑𝑡𝑙\varphi(t;l) by φ(0;l)+φ(0;l)𝜑0𝑙superscript𝜑0𝑙\varphi(0;l)+\varphi^{\prime}(0;l). The approach is very similar to the one used to prove optimal rates of convergence of the overlaps. By symmetry between sites, we have

φ(t;l)=N𝔼[xN(1)xN(l)R1,ltY(t)k].𝜑𝑡𝑙𝑁𝔼subscriptdelimited-⟨⟩superscriptsubscript𝑥𝑁1superscriptsubscript𝑥𝑁𝑙superscriptsubscript𝑅1𝑙𝑡𝑌superscript𝑡𝑘\varphi(t;l)=N\operatorname{\mathbb{E}}\left[\left\langle x_{N}^{(1)}x_{N}^{(l)}R_{1,l}^{-}\right\rangle_{t}Y(t)^{k}\right].

Notice that since the last variables decouple from the rest of the system at t=0𝑡0t=0, we have

φ(0;l)𝜑0𝑙\displaystyle\varphi(0;l) =N𝔼[xN(1)xN(l)0]𝔼[R1,l0Y(0)k]absent𝑁𝔼subscriptdelimited-⟨⟩superscriptsubscript𝑥𝑁1superscriptsubscript𝑥𝑁𝑙0𝔼subscriptdelimited-⟨⟩superscriptsubscript𝑅1𝑙0𝑌superscript0𝑘\displaystyle=N\operatorname{\mathbb{E}}\left[\langle x_{N}^{(1)}x_{N}^{(l)}\rangle_{0}\right]\cdot\operatorname{\mathbb{E}}\left[\left\langle R_{1,l}^{-}\right\rangle_{0}Y(0)^{k}\right]
=N𝔼Px[X]2𝔼[R1,l0Y(0)k]=0.\displaystyle=N\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X]^{2}\cdot\operatorname{\mathbb{E}}\left[\left\langle R_{1,l}^{-}\right\rangle_{0}Y(0)^{k}\right]=0.

The expressions of the derivatives are a bit cumbersome so we do not display them, but we will describe their main features. From here onwards, we present the proof of (51) and  (52) for l=2𝑙2l=2 for concreteness. The exact same argument goes through for l=𝑙l=*. The only difference is in the number of terms showing up the derivatives of φ𝜑\varphi, not their nature. The derivative φ(t;2)superscript𝜑𝑡2\varphi^{\prime}(t;2) will be a sum of different terms, all of the form

λNc(k)𝔼[R1,2Ra,bxN(1)xN(2)xN(a)xN(b)tY(t)n],𝜆𝑁𝑐𝑘𝔼subscriptdelimited-⟨⟩subscriptsuperscript𝑅12subscriptsuperscript𝑅𝑎𝑏superscriptsubscript𝑥𝑁1superscriptsubscript𝑥𝑁2superscriptsubscript𝑥𝑁𝑎superscriptsubscript𝑥𝑁𝑏𝑡𝑌superscript𝑡𝑛\lambda Nc(k)\operatorname{\mathbb{E}}\left[\left\langle R^{-}_{1,2}R^{-}_{a,b}x_{N}^{(1)}x_{N}^{(2)}x_{N}^{(a)}x_{N}^{(b)}\right\rangle_{t}Y(t)^{n}\right], (53)

where n{k2,k1,k}𝑛𝑘2𝑘1𝑘n\in\{k-2,k-1,k\} and (a,b){(1,2),(1,3),(3,4),(1,),(3,)}𝑎𝑏12133413(a,b)\in\{(1,2),(1,3),(3,4),(1,*),(3,*)\}, and c(k)𝑐𝑘c(k) is a polynomial of degree 2absent2\leq 2 in k𝑘k. We see that at t=0𝑡0t=0, if the above expression involves a variable xNsubscript𝑥𝑁x_{N} of degree 1 then this term vanishes. Therefore the only remaining term is the one where (a,b)=(1,2)𝑎𝑏12(a,b)=(1,2). One can verify that c(k)=1𝑐𝑘1c(k)=1 for this term. Therefore

φ(0;2)superscript𝜑02\displaystyle\varphi^{\prime}(0;2) =λN𝔼[xN(1)2xN(2)20]𝔼[(R1,l)20Y(0)k]absent𝜆𝑁𝔼subscriptdelimited-⟨⟩superscriptsuperscriptsubscript𝑥𝑁12superscriptsuperscriptsubscript𝑥𝑁220𝔼subscriptdelimited-⟨⟩superscriptsuperscriptsubscript𝑅1𝑙20𝑌superscript0𝑘\displaystyle=\lambda N\operatorname{\mathbb{E}}\left[\langle{x_{N}^{(1)}}^{2}{x_{N}^{(2)}}^{2}\rangle_{0}\right]\cdot\operatorname{\mathbb{E}}\left[\langle(R_{1,l}^{-})^{2}\rangle_{0}Y(0)^{k}\right]
=λN𝔼Px[X2]2𝔼[(R1,l)20Y(0)k]\displaystyle=\lambda N\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}\left[X^{2}\right]^{2}\cdot\operatorname{\mathbb{E}}\left[\langle(R_{1,l}^{-})^{2}\rangle_{0}Y(0)^{k}\right]
=λN𝔼[(R1,l)20Y(0)k].absent𝜆𝑁𝔼subscriptdelimited-⟨⟩superscriptsuperscriptsubscript𝑅1𝑙20𝑌superscript0𝑘\displaystyle=\lambda N\operatorname{\mathbb{E}}\left[\langle(R_{1,l}^{-})^{2}\rangle_{0}Y(0)^{k}\right]. (54)

Now we turn to φ′′(t;2)superscript𝜑′′𝑡2\varphi^{\prime\prime}(t;2). Taking another derivative generates monomials of degree three in the overlaps and the last variable, so φ′′(t;2)superscript𝜑′′𝑡2\varphi^{\prime\prime}(t;2) is a sum of terms of the form

λ2Nc(k)𝔼[R1,2Ra,bRc,dxN(1)xN(2)xN(a)xN(b)xN(c)xN(d)tY(t)n],superscript𝜆2𝑁superscript𝑐𝑘𝔼subscriptdelimited-⟨⟩subscriptsuperscript𝑅12subscriptsuperscript𝑅𝑎𝑏subscriptsuperscript𝑅𝑐𝑑superscriptsubscript𝑥𝑁1superscriptsubscript𝑥𝑁2superscriptsubscript𝑥𝑁𝑎superscriptsubscript𝑥𝑁𝑏superscriptsubscript𝑥𝑁𝑐superscriptsubscript𝑥𝑁𝑑𝑡𝑌superscript𝑡𝑛\lambda^{2}Nc^{\prime}(k)\operatorname{\mathbb{E}}\left[\left\langle R^{-}_{1,2}R^{-}_{a,b}R^{-}_{c,d}x_{N}^{(1)}x_{N}^{(2)}x_{N}^{(a)}x_{N}^{(b)}x_{N}^{(c)}x_{N}^{(d)}\right\rangle_{t}Y(t)^{n}\right], (55)

where c(k)superscript𝑐𝑘c^{\prime}(k) is a polynomial of degree 3absent3\leq 3 in k𝑘k, and n{k3,k2,k1,k}𝑛𝑘3𝑘2𝑘1𝑘n\in\{k-3,k-2,k-1,k\}. Our goal is to bound the second derivative independently of t𝑡t, so that we are able to use the Taylor approximation

|φ(1;2)φ(0;2)φ(0;2)|sup0t1|φ′′(t;2)|.𝜑12𝜑02superscript𝜑02subscriptsupremum0𝑡1superscript𝜑′′𝑡2\left|\varphi(1;2)-\varphi(0;2)-\varphi^{\prime}(0;2)\right|\leq\sup_{0\leq t\leq 1}\left|\varphi^{\prime\prime}(t;2)\right|. (56)

Since prior Pxsubscript𝑃xP_{\textup{{x}}} has bounded support, Hölder’s inequality implies that (55) is bounded by

NK(k,λ)𝑁𝐾𝑘𝜆\displaystyle NK(k,\lambda) 𝔼[|R1,2Ra,bRc,d|tp]1/p𝔼[|Y(t)|nq]1/q\displaystyle\operatorname{\mathbb{E}}\left[\left\langle\left|R^{-}_{1,2}R^{-}_{a,b}R^{-}_{c,d}\right|\right\rangle_{t}^{p}\right]^{1/p}\operatorname{\mathbb{E}}\left[|Y(t)|^{nq}\right]^{1/q}
NK(k,λ)𝔼[|R1,2|3pt]1/p𝔼[|Y(t)|nq]1/q,\displaystyle\leq NK(k,\lambda)\operatorname{\mathbb{E}}\left[\left\langle|R^{-}_{1,2}|^{3p}\right\rangle_{t}\right]^{1/p}\operatorname{\mathbb{E}}\left[|Y(t)|^{nq}\right]^{1/q},

where 1/p+1/q=11𝑝1𝑞11/p+1/q=1. The last bound follows from Jensen’s inequality (since p1𝑝1p\geq 1) and another application of Hölder’s inequality. We let p=4/3𝑝43p=4/3 and q=4𝑞4q=4. Using a straightforward analogue of Lemma 16 for the measure 𝔼t\operatorname{\mathbb{E}}\langle\cdot\rangle_{t}, and the convergence of the fourth moment, Theorem 7, we have

𝔼(R1,2)4tK(λ)𝔼(R1,2)4K(λ)N2.\operatorname{\mathbb{E}}\left\langle(R^{-}_{1,2})^{4}\right\rangle_{t}\leq K(\lambda)\operatorname{\mathbb{E}}\left\langle(R^{-}_{1,2})^{4}\right\rangle\leq\frac{K(\lambda)}{N^{2}}.

We use the following lemma to bound the moments of Y(t)𝑌𝑡Y(t):

Lemma 22.

For all λ<λc𝜆subscript𝜆𝑐\lambda<\lambda_{c} and integers k𝑘k, there exists a constant K(k,λ)0𝐾𝑘𝜆0K(k,\lambda)\geq 0 such that for all t[0,1]𝑡01t\in[0,1]

𝔼[Y(t)2k]K(k,λ).𝔼𝑌superscript𝑡2𝑘𝐾𝑘𝜆\operatorname{\mathbb{E}}\left[Y(t)^{2k}\right]\leq K(k,\lambda).
Proof.

Taking a derivative w.r.t. time, we have

ddt𝔼[Y(t)k]=dd𝑡𝔼𝑌superscript𝑡𝑘absent\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}\operatorname{\mathbb{E}}\left[Y(t)^{k}\right]= λk2𝔼[xN(1)xN(2)R1,2tY(t)k1]+λk𝔼[xN(1)xNR1,tY(t)k1]𝜆𝑘2𝔼subscriptdelimited-⟨⟩superscriptsubscript𝑥𝑁1superscriptsubscript𝑥𝑁2superscriptsubscript𝑅12𝑡𝑌superscript𝑡𝑘1𝜆𝑘𝔼subscriptdelimited-⟨⟩superscriptsubscript𝑥𝑁1superscriptsubscript𝑥𝑁superscriptsubscript𝑅1𝑡𝑌superscript𝑡𝑘1\displaystyle-\frac{\lambda k}{2}\operatorname{\mathbb{E}}\left[\left\langle x_{N}^{(1)}x_{N}^{(2)}R_{1,2}^{-}\right\rangle_{t}Y(t)^{k-1}\right]+\lambda k\operatorname{\mathbb{E}}\left[\left\langle x_{N}^{(1)}x_{N}^{*}R_{1,*}^{-}\right\rangle_{t}Y(t)^{k-1}\right]
+λk(k1)2𝔼[xN(1)xN(2)R1,2tY(t)k2].𝜆𝑘𝑘12𝔼subscriptdelimited-⟨⟩superscriptsubscript𝑥𝑁1superscriptsubscript𝑥𝑁2superscriptsubscript𝑅12𝑡𝑌superscript𝑡𝑘2\displaystyle+\frac{\lambda k(k-1)}{2}\operatorname{\mathbb{E}}\left[\left\langle x_{N}^{(1)}x_{N}^{(2)}R_{1,2}^{-}\right\rangle_{t}Y(t)^{k-2}\right].

By Hölder’s inequality and boundedness of the variables and overlaps,

|ddt𝔼[Y(t)k]|K(k,λ)(𝔼[|Y(t)|k]11/k+𝔼[|Y(t)|k]12/k).\left|\frac{\mathrm{d}}{\mathrm{d}t}\operatorname{\mathbb{E}}\left[Y(t)^{k}\right]\right|\leq K(k,\lambda)\left(\operatorname{\mathbb{E}}\left[|Y(t)|^{k}\right]^{1-1/k}+\operatorname{\mathbb{E}}\left[|Y(t)|^{k}\right]^{1-2/k}\right).

The first term is generated by the terms involving Y(t)k1𝑌superscript𝑡𝑘1Y(t)^{k-1} in the derivative, and the second term comes from the one involving Y(t)k2𝑌superscript𝑡𝑘2Y(t)^{k-2}. Since k𝑘k is even, we drop the absolute values on the right-hand side. Next, use the fact xa1+xsuperscript𝑥𝑎1𝑥x^{a}\leq 1+x for all x0𝑥0x\geq 0 and 0a10𝑎10\leq a\leq 1, then we use Grönwall’s lemma to conclude. \blacksquare

Therefore by the above estimates we have

sup0t1|φ′′(t;2)|K(k,λ)N.subscriptsupremum0𝑡1superscript𝜑′′𝑡2𝐾𝑘𝜆𝑁\sup_{0\leq t\leq 1}|\varphi^{\prime\prime}(t;2)|\leq\frac{K(k,\lambda)}{\sqrt{N}}. (57)

Now, our next goal is to prove

|φ(0;2)λN𝔼[R1,22X(λ)k]|K(k,λ)N.superscript𝜑02𝜆𝑁𝔼delimited-⟨⟩superscriptsubscript𝑅122𝑋superscript𝜆𝑘𝐾𝑘𝜆𝑁\left|\varphi^{\prime}(0;2)-\lambda N\operatorname{\mathbb{E}}\left[\langle R_{1,2}^{2}\rangle X(\lambda)^{k}\right]\right|\leq\frac{K(k,\lambda)}{\sqrt{N}}. (58)

We consider the function (this should come with no surprise at this point)

ψ(t):=λN𝔼[(R1,2)2tY(t)k].assign𝜓𝑡𝜆𝑁𝔼subscriptdelimited-⟨⟩superscriptsuperscriptsubscript𝑅122𝑡𝑌superscript𝑡𝑘\psi(t):=\lambda N\operatorname{\mathbb{E}}\left[\left\langle(R_{1,2}^{-})^{2}\right\rangle_{t}Y(t)^{k}\right].

Observe that (54) tells us ψ(0)=φ(0;2)𝜓0superscript𝜑02\psi(0)=\varphi^{\prime}(0;2). On the other hand,

|ψ(1)λN𝔼[R1,22X(λ)k]|2λ𝔼[|R1,2xN(1)xN(2)||X(λ)|k]+λN𝔼[xN(1)2xN(2)2|X(λ)|k].𝜓1𝜆𝑁𝔼delimited-⟨⟩superscriptsubscript𝑅122𝑋superscript𝜆𝑘2𝜆𝔼delimited-⟨⟩superscriptsubscript𝑅12superscriptsubscript𝑥𝑁1superscriptsubscript𝑥𝑁2superscript𝑋𝜆𝑘𝜆𝑁𝔼delimited-⟨⟩superscriptsuperscriptsubscript𝑥𝑁12superscriptsuperscriptsubscript𝑥𝑁22superscript𝑋𝜆𝑘\left|\psi(1)-\lambda N\operatorname{\mathbb{E}}\left[\langle R_{1,2}^{2}\rangle X(\lambda)^{k}\right]\right|\leq 2\lambda\operatorname{\mathbb{E}}\left[\left\langle\left|R_{1,2}^{-}x_{N}^{(1)}x_{N}^{(2)}\right|\right\rangle|X(\lambda)|^{k}\right]+\frac{\lambda}{N}\operatorname{\mathbb{E}}\left[\left\langle{x_{N}^{(1)}}^{2}{x_{N}^{(2)}}^{2}\right\rangle|X(\lambda)|^{k}\right].

Using Lemma 22 and Hölder’s inequality, the first term is bounded by K(k,λ)(𝔼(R1,2)2)1/2K(k,λ)/N𝐾𝑘𝜆superscript𝔼superscriptsuperscriptsubscript𝑅12212𝐾𝑘𝜆𝑁K(k,\lambda)(\operatorname{\mathbb{E}}\langle(R_{1,2}^{-})^{2}\rangle)^{1/2}\leq K(k,\lambda)/\sqrt{N}, and the second term is bounded by K(k,λ)/N𝐾𝑘𝜆𝑁K(k,\lambda)/N. So it suffices to show that

sup0t1|ψ(t)|K(k,λ)N.subscriptsupremum0𝑡1superscript𝜓𝑡𝐾𝑘𝜆𝑁\sup_{0\leq t\leq 1}|\psi^{\prime}(t)|\leq\frac{K(k,\lambda)}{\sqrt{N}}.

Similarly to φ𝜑\varphi, the derivative of ψ𝜓\psi is a sum of terms of the form

λ2Nc(k)𝔼[(R1,2)2Ra,bxN(a)xN(b)tY(t)n].superscript𝜆2𝑁𝑐𝑘𝔼subscriptdelimited-⟨⟩superscriptsubscriptsuperscript𝑅122subscriptsuperscript𝑅𝑎𝑏superscriptsubscript𝑥𝑁𝑎superscriptsubscript𝑥𝑁𝑏𝑡𝑌superscript𝑡𝑛\lambda^{2}Nc(k)\operatorname{\mathbb{E}}\left[\left\langle(R^{-}_{1,2})^{2}R^{-}_{a,b}x_{N}^{(a)}x_{N}^{(b)}\right\rangle_{t}Y(t)^{n}\right].

It is clear that the same method used to bound φ′′superscript𝜑′′\varphi^{\prime\prime} (the generic term of which is (55)) also works in this case, so we obtain the desired bound on ψsuperscript𝜓\psi^{\prime}. Finally, using (56), (57) and (58), we obtain

N𝔼[R1,22X(λ)k]𝔼[xN(1)2xN(2)2X(λ)k]=λN𝔼[R1,22X(λ)k]+δ,𝑁𝔼delimited-⟨⟩superscriptsubscript𝑅122𝑋superscript𝜆𝑘𝔼delimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥2𝑁2𝑋superscript𝜆𝑘𝜆𝑁𝔼delimited-⟨⟩superscriptsubscript𝑅122𝑋superscript𝜆𝑘𝛿N\operatorname{\mathbb{E}}\left[\langle R_{1,2}^{2}\rangle X(\lambda)^{k}\right]-\operatorname{\mathbb{E}}\left[\langle{x^{(1)}_{N}}^{2}{x^{(2)}_{N}}^{2}\rangle X(\lambda)^{k}\right]=\lambda N\operatorname{\mathbb{E}}\left[\langle R_{1,2}^{2}\rangle X(\lambda)^{k}\right]+\delta,

where |δ|K(k,λ)/N𝛿𝐾𝑘𝜆𝑁|\delta|\leq K(k,\lambda)/\sqrt{N}. This is equivalent to (51) and closes the first stage of the argument. Now we need to show that

𝔼[xN(1)2xN(2)2X(λ)k]=𝔼[X(λ)k]+δ.𝔼delimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥2𝑁2𝑋superscript𝜆𝑘𝔼𝑋superscript𝜆𝑘𝛿\operatorname{\mathbb{E}}\left[\langle{x^{(1)}_{N}}^{2}{x^{(2)}_{N}}^{2}\rangle X(\lambda)^{k}\right]=\operatorname{\mathbb{E}}\left[X(\lambda)^{k}\right]+\delta.

The argument has become a routine by now: we consider the function

ψ(t)=𝔼[xN(1)2xN(2)2tY(t)k].𝜓𝑡𝔼subscriptdelimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥2𝑁2𝑡𝑌superscript𝑡𝑘\psi(t)=\operatorname{\mathbb{E}}\left[\langle{x^{(1)}_{N}}^{2}{x^{(2)}_{N}}^{2}\rangle_{t}Y(t)^{k}\right].

We have

ψ(0)=𝔼[xN(1)2xN(2)20]𝔼[Y(0)k]=𝔼Px[X2]2𝔼[Y(0)k]=𝔼[Y(0)k].\psi(0)=\operatorname{\mathbb{E}}\left[\langle{x^{(1)}_{N}}^{2}{x^{(2)}_{N}}^{2}\rangle_{0}\right]\cdot\operatorname{\mathbb{E}}\left[Y(0)^{k}\right]=\operatorname{\mathbb{E}}_{P_{\textup{{x}}}}[X^{2}]^{2}\cdot\operatorname{\mathbb{E}}\left[Y(0)^{k}\right]=\operatorname{\mathbb{E}}\left[Y(0)^{k}\right].

The derivative of ψ𝜓\psi is a sum of term of the form

λc(k)𝔼[xN(1)2xN(2)2Ra,bxN(a)xN(b)tY(t)n].𝜆𝑐𝑘𝔼subscriptdelimited-⟨⟩superscriptsubscriptsuperscript𝑥1𝑁2superscriptsubscriptsuperscript𝑥2𝑁2subscriptsuperscript𝑅𝑎𝑏superscriptsubscript𝑥𝑁𝑎superscriptsubscript𝑥𝑁𝑏𝑡𝑌superscript𝑡𝑛\lambda c(k)\operatorname{\mathbb{E}}\left[\left\langle{x^{(1)}_{N}}^{2}{x^{(2)}_{N}}^{2}R^{-}_{a,b}x_{N}^{(a)}x_{N}^{(b)}\right\rangle_{t}Y(t)^{n}\right].

By our earlier argument, |ψ(t)|K(k,λ)/Nsuperscript𝜓𝑡𝐾𝑘𝜆𝑁|\psi^{\prime}(t)|\leq K(k,\lambda)/\sqrt{N} for all t𝑡t. We similarly argue that |ddt𝔼[Y(t)k]|K(k,λ)/Ndd𝑡𝔼𝑌superscript𝑡𝑘𝐾𝑘𝜆𝑁\left|\frac{\mathrm{d}}{\mathrm{d}t}\operatorname{\mathbb{E}}[Y(t)^{k}]\right|\leq K(k,\lambda)/\sqrt{N} for all t𝑡t, so that

|ψ(1)𝔼[Y(1)k]|K(k,λ)N.𝜓1𝔼𝑌superscript1𝑘𝐾𝑘𝜆𝑁\left|\psi(1)-\operatorname{\mathbb{E}}\left[Y(1)^{k}\right]\right|\leq\frac{K(k,\lambda)}{\sqrt{N}}.

This yields (52) and thus concludes the proof.

Appendix A Appendix

Here, we prove Lemma 10. A straightforward calculation reveals that

s  ψ(r,s)=𝔼[xx],and2s2  ψ(r,s)=𝔼[x2(x2x2)]>0,formulae-sequence𝑠  𝜓𝑟𝑠𝔼delimited-⟨⟩𝑥superscript𝑥andsuperscript2superscript𝑠2  𝜓𝑟𝑠𝔼superscript𝑥absent2delimited-⟨⟩superscript𝑥2superscriptdelimited-⟨⟩𝑥20\frac{\partial}{\partial s}\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,s)=\operatorname{\mathbb{E}}\left[\langle xx^{*}\rangle\right],\quad\mbox{and}\quad\frac{\partial^{2}}{\partial s^{2}}\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,s)=\operatorname{\mathbb{E}}\left[x^{*2}(\langle x^{2}\rangle-\langle x\rangle^{2})\right]>0,

so that ss  ψ(r,s)maps-to𝑠𝑠  𝜓𝑟𝑠s\mapsto\frac{\partial}{\partial s}\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,s) is Lipschitz and strongly convex on any interval, and for all r0𝑟0r\geq 0.

Let ν=Px𝜈subscript𝑃x\nu=P_{\textup{{x}}}, and let μ𝜇\mu be the symmetric part of Pxsubscript𝑃xP_{\textup{{x}}}, i.e., μ(A)=(Px(A)+Px(A))/2𝜇𝐴subscript𝑃x𝐴subscript𝑃x𝐴2\mu(A)=(P_{\textup{{x}}}(A)+P_{\textup{{x}}}(-A))/2 for all Borel A𝐴A\subseteq\mathbb{R}. We observe that ν𝜈\nu is absolutely continuous with respect to μ𝜇\mu, so that the Radon-Nikodym derivative dνdμd𝜈d𝜇\frac{\mathrm{d}\nu}{\mathrm{d}\mu} is a well-defined measurable function from \mathbb{R} to +subscript\mathbb{R}_{+} that integrates to one.

Proposition 23.

For all r0𝑟0r\geq 0, we have

  ψ(r,r)  ψ(r,r)2𝔼[dνdμ(x)1μ,r2],  𝜓𝑟𝑟  𝜓𝑟𝑟2𝔼superscriptsubscriptdelimited-⟨⟩d𝜈d𝜇𝑥1𝜇𝑟2\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)-\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r)\geq 2\operatorname{\mathbb{E}}\left[\left\langle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x)-1\right\rangle_{\mu,r}^{2}\right],

where μ,rsubscriptdelimited-⟨⟩𝜇𝑟\langle\cdot\rangle_{\mu,r} is the average w.r.t. to the Gibbs measure corresponding to the Gaussian channel y=rx+z𝑦𝑟superscript𝑥𝑧y=\sqrt{r}x^{*}+z, xμsimilar-tosuperscript𝑥𝜇x^{*}\sim\mu and z𝒩(0,1)similar-to𝑧𝒩01z\sim\mathcal{N}(0,1). Moreover, if r>0𝑟0r>0, the right-hand side of the above inequality is zero if and only if μ=ν𝜇𝜈\mu=\nu, i.e., the prior Pxsubscript𝑃xP_{\textup{{x}}} is symmetric.

Finally, the last statement is given here.

Lemma 24.

The map r𝔼[dνdμ(x)1μ,r2]maps-to𝑟𝔼superscriptsubscriptdelimited-⟨⟩d𝜈d𝜇𝑥1𝜇𝑟2r\mapsto\operatorname{\mathbb{E}}\left[\left\langle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x)-1\right\rangle_{\mu,r}^{2}\right] is increasing on +subscript\mathbb{R}_{+}.

Proof.

This is a matter of showing that the derivative of the above function is non-negative. By standard manipulations (Gaussian integration by parts, Nishimori property), the derivative can be written as

𝔼[x(dνdμ(x)1)μ,r2].𝔼superscriptsubscriptdelimited-⟨⟩𝑥d𝜈d𝜇𝑥1𝜇𝑟2\operatorname{\mathbb{E}}\left[\left\langle x\left(\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x)-1\right)\right\rangle_{\mu,r}^{2}\right].

\blacksquare

Proof of Proposition 23. The argument relies on a smooth interpolation method between the two measures μ𝜇\mu and ν𝜈\nu. Let t[0,1]𝑡01t\in[0,1] and let ρt=(1t)μ+tνsubscript𝜌𝑡1𝑡𝜇𝑡𝜈\rho_{t}=(1-t)\mu+t\nu. Further, let r,s0𝑟𝑠0r,s\geq 0 be fixed, and

  ψ(r,s;t):=𝔼z(logexp(rzx+sxxr2x2)dρt(x))dρt(x),assign  𝜓𝑟𝑠𝑡subscript𝔼𝑧𝑟𝑧𝑥𝑠𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝜌𝑡𝑥differential-dsubscript𝜌𝑡superscript𝑥\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,s;t):=\operatorname{\mathbb{E}}_{z}\int\left(\log\int\exp\left(\sqrt{r}zx+sxx^{*}-\frac{r}{2}x^{2}\right)\mathrm{d}\rho_{t}(x)\right)\mathrm{d}\rho_{t}(x^{*}),

where z𝒩(0,1)similar-to𝑧𝒩01z\sim\mathcal{N}(0,1). Now let

ϕ(t)=  ψ(r,r;t)  ψ(r,r;t).italic-ϕ𝑡  𝜓𝑟𝑟𝑡  𝜓𝑟𝑟𝑡\phi(t)=\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r;t)-\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r;t).

We have ϕ(1)=  ψ(r,r)  ψ(r,r)italic-ϕ1  𝜓𝑟𝑟  𝜓𝑟𝑟\phi(1)=\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)-\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r) on the one hand, and since μ𝜇\mu is a symmetric distribution, ϕ(0)=0italic-ϕ00\phi(0)=0 on the other. We will show that ϕitalic-ϕ\phi is a convex increasing function on the interval [0,1]01[0,1], strictly so if μν𝜇𝜈\mu\neq\nu, and that ϕ(0)=0superscriptitalic-ϕ00\phi^{\prime}(0)=0. Then we deduce that ϕ(1)ϕ′′(0)2italic-ϕ1superscriptitalic-ϕ′′02\phi(1)\geq\frac{\phi^{\prime\prime}(0)}{2}, allowing us to conclude. First, we have

ddt  ψ(r,r;t)dd𝑡  𝜓𝑟𝑟𝑡\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r;t) =𝔼zlogerzx+rxxr2x2dρt(x)d(νμ)(x)absentsubscript𝔼𝑧superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝜌𝑡𝑥d𝜈𝜇superscript𝑥\displaystyle=\operatorname{\mathbb{E}}_{z}\int\log\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\rho_{t}(x)~{}\mathrm{d}(\nu-\mu)(x^{*})
+𝔼zerzx+rxxr2x2d(νμ)(x)erzx+rxxr2x2dρt(x)dρt(x),subscript𝔼𝑧superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2d𝜈𝜇𝑥superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝜌𝑡𝑥differential-dsubscript𝜌𝑡superscript𝑥\displaystyle~{}~{}~{}+\operatorname{\mathbb{E}}_{z}\int\frac{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}(\nu-\mu)(x)}{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\rho_{t}(x)}~{}\mathrm{d}\rho_{t}(x^{*}),

and

d2dt2  ψ(r,r;t)superscriptd2dsuperscript𝑡2  𝜓𝑟𝑟𝑡\displaystyle\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}}\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r;t) =2𝔼zerzx+rxxr2x2d(νμ)(x)erzx+rxxr2x2dρt(x)d(νμ)(x)absent2subscript𝔼𝑧superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2d𝜈𝜇𝑥superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝜌𝑡𝑥d𝜈𝜇superscript𝑥\displaystyle=2\operatorname{\mathbb{E}}_{z}\int\frac{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}(\nu-\mu)(x)}{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\rho_{t}(x)}~{}\mathrm{d}(\nu-\mu)(x^{*})
2𝔼z(erzx+rxxr2x2d(νμ)(x)erzx+rxxr2x2dρt(x))2dρt(x).2subscript𝔼𝑧superscriptsuperscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2d𝜈𝜇𝑥superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝜌𝑡𝑥2differential-dsubscript𝜌𝑡superscript𝑥\displaystyle~{}~{}~{}-2\operatorname{\mathbb{E}}_{z}\int\left(\frac{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}(\nu-\mu)(x)}{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\rho_{t}(x)}\right)^{2}~{}\mathrm{d}\rho_{t}(x^{*}).

Similar expressions holds for   ψ(r,r;t)  𝜓𝑟𝑟𝑡\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r;t) where xsuperscript𝑥x^{*} is replaced by xsuperscript𝑥-x^{*} inside the exponentials. We see from the expression of the first derivative at t=0𝑡0t=0 that   ψ(r,r;0)=  ψ(r,r;0)  𝜓superscript𝑟𝑟0  𝜓superscript𝑟𝑟0\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r;0)^{\prime}=\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r;0)^{\prime}. This is because ρ0=μsubscript𝜌0𝜇\rho_{0}=\mu is symmetric about the origin, so a sign change (of x𝑥x for the first term, and xsuperscript𝑥x^{*} for the second term in the expression) does not affect the value of the integrals. Hence ϕ(0)=0superscriptitalic-ϕ00\phi^{\prime}(0)=0. Now, we focus on the second derivative. Observe that since μ𝜇\mu is the symmetric part of ν𝜈\nu, νμ𝜈𝜇\nu-\mu is anti-symmetric. This implies that the first term in the expression of the second derivative changes sign under a sign change in xsuperscript𝑥x^{*}, and keeps the same modulus. As for the second term, a sign change in xsuperscript𝑥x^{*} induces integration against dρt(x)dsubscript𝜌𝑡superscript𝑥\mathrm{d}\rho_{t}(-x^{*}). Hence we can write the difference (  ψ(r,r;t)  ψ(r,r;t))′′superscript  𝜓𝑟𝑟𝑡  𝜓𝑟𝑟𝑡′′(\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r;t)-\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r;t))^{\prime\prime} as

d2dt2ϕ(t)superscriptd2dsuperscript𝑡2italic-ϕ𝑡\displaystyle\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}}\phi(t) =4𝔼zerzx+rxxr2x2d(νμ)(x)erzx+rxxr2x2dρt(x)d(νμ)(x)absent4subscript𝔼𝑧superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2d𝜈𝜇𝑥superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝜌𝑡𝑥d𝜈𝜇superscript𝑥\displaystyle=4\operatorname{\mathbb{E}}_{z}\int\frac{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}(\nu-\mu)(x)}{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\rho_{t}(x)}~{}\mathrm{d}(\nu-\mu)(x^{*})
2𝔼z(erzx+rxxr2x2d(νμ)(x)erzx+rxxr2x2dρt(x))2(dρt(x)dρt(x)).2subscript𝔼𝑧superscriptsuperscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2d𝜈𝜇𝑥superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝜌𝑡𝑥2dsubscript𝜌𝑡superscript𝑥dsubscript𝜌𝑡superscript𝑥\displaystyle~{}~{}~{}-2\operatorname{\mathbb{E}}_{z}\int\left(\frac{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}(\nu-\mu)(x)}{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\rho_{t}(x)}\right)^{2}~{}(\mathrm{d}\rho_{t}(x^{*})-\mathrm{d}\rho_{t}(-x^{*})).

For any Borel A𝐴A, we have ρt(A)ρt(A)=(1t)(μ(A)μ(A))+t(ν(A)ν(A))=2t(νμ)(A)subscript𝜌𝑡𝐴subscript𝜌𝑡𝐴1𝑡𝜇𝐴𝜇𝐴𝑡𝜈𝐴𝜈𝐴2𝑡𝜈𝜇𝐴\rho_{t}(A)-\rho_{t}(-A)=(1-t)(\mu(A)-\mu(-A))+t(\nu(A)-\nu(-A))=2t(\nu-\mu)(A). Therefore the second term in the above expression becomes

4t𝔼z(erzx+rxxr2x2d(νμ)(x)erzx+rxxr2x2dρt(x))2d(νμ)(x).4𝑡subscript𝔼𝑧superscriptsuperscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2d𝜈𝜇𝑥superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-dsubscript𝜌𝑡𝑥2d𝜈𝜇superscript𝑥-4t\operatorname{\mathbb{E}}_{z}\int\left(\frac{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}(\nu-\mu)(x)}{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\rho_{t}(x)}\right)^{2}~{}\mathrm{d}(\nu-\mu)(x^{*}).

Since both μ𝜇\mu and ν𝜈\nu are absolutely continuous with respect to ρtsubscript𝜌𝑡\rho_{t} for all 0t<10𝑡10\leq t<1 we write

d2dt2ϕ(t)=4𝔼z,xd(νμ)dρt(x)d(νμ)dρt(x)4t𝔼z,xd(νμ)dρt(x)2,\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}}\phi(t)=4\operatorname{\mathbb{E}}_{z,x^{*}}\left\langle\frac{\mathrm{d}(\nu-\mu)}{\mathrm{d}\rho_{t}}(x)\frac{\mathrm{d}(\nu-\mu)}{\mathrm{d}\rho_{t}}(x^{*})\right\rangle-4t\operatorname{\mathbb{E}}_{z,x^{*}}\left\langle\frac{\mathrm{d}(\nu-\mu)}{\mathrm{d}\rho_{t}}(x)\right\rangle^{2},

where the Gibbs average is with respect to the posterior of x𝑥x given z,x𝑧superscript𝑥z,x^{*} under the Gaussian channel y=rx+z𝑦𝑟superscript𝑥𝑧y=\sqrt{r}x^{*}+z, and the expectation is under xρtsimilar-tosuperscript𝑥subscript𝜌𝑡x^{*}\sim\rho_{t} and z𝒩(0,1)similar-to𝑧𝒩01z\sim\mathcal{N}(0,1). By the Nishimori property, we simplify the above expression to

d2dt2ϕ(t)=4(1t)𝔼[d(νμ)dρt(x)2],superscriptd2dsuperscript𝑡2italic-ϕ𝑡41𝑡𝔼superscriptdelimited-⟨⟩d𝜈𝜇dsubscript𝜌𝑡𝑥2\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}}\phi(t)=4(1-t)\operatorname{\mathbb{E}}\left[\left\langle\frac{\mathrm{d}(\nu-\mu)}{\mathrm{d}\rho_{t}}(x)\right\rangle^{2}\right],

where the expression is valid for all 0t<10𝑡10\leq t<1. From here we see that the function ϕitalic-ϕ\phi is convex on [0,1]01[0,1] (where we have closed the right end of the interval by continuity). Since ϕ(0)=ϕ(0)=0italic-ϕ0superscriptitalic-ϕ00\phi(0)=\phi^{\prime}(0)=0, ϕitalic-ϕ\phi is also increasing on [0,1]01[0,1]. Therefore we have

ϕ(1)12ϕ′′(0)=2𝔼[dνdμ(x)1μ,r2].italic-ϕ112superscriptitalic-ϕ′′02𝔼superscriptsubscriptdelimited-⟨⟩d𝜈d𝜇𝑥1𝜇𝑟2\phi(1)\geq\frac{1}{2}\phi^{\prime\prime}(0)=2\operatorname{\mathbb{E}}\left[\left\langle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x)-1\right\rangle_{\mu,r}^{2}\right].

Now it remains to show that if   ψ(r,r)=  ψ(r,r)  𝜓𝑟𝑟  𝜓𝑟𝑟\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r)=\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r) for some r>0𝑟0r>0 then μ=ν𝜇𝜈\mu=\nu. By the lower bound we have shown, equality of   ψ(r,r)  𝜓𝑟𝑟\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,r) and   ψ(r,r)  𝜓𝑟𝑟\makebox[0.0pt][l]{\hskip 2.08334pt\hskip 0.58333pt\rule[8.23611pt]{3.65486pt}{0.43057pt}}{\psi}(r,-r) would imply

dνdμ(x)μ,r=erzx+rxxr2x2dν(x)erzx+rxxr2x2dμ(x)=1subscriptdelimited-⟨⟩d𝜈d𝜇𝑥𝜇𝑟superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-d𝜈𝑥superscript𝑒𝑟𝑧𝑥𝑟𝑥superscript𝑥𝑟2superscript𝑥2differential-d𝜇𝑥1\left\langle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x)\right\rangle_{\mu,r}=\frac{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\nu(x)}{\int e^{\sqrt{r}zx+rxx^{*}-\frac{r}{2}x^{2}}\mathrm{d}\mu(x)}=1

for (Lebesgue-)almost all z𝑧z and Pxsubscript𝑃xP_{\textup{{x}}}-almost all xsuperscript𝑥x^{*}. We make the change of variable zr(zx)maps-to𝑧𝑟𝑧superscript𝑥z\mapsto\sqrt{r}(z-x^{*}) and complete the squares, then the above is equivalent to

er2(xz)2dν(x)=er2(xz)2dμ(x)superscript𝑒𝑟2superscript𝑥𝑧2differential-d𝜈𝑥superscript𝑒𝑟2superscript𝑥𝑧2differential-d𝜇𝑥\int e^{-\frac{r}{2}(x-z)^{2}}\mathrm{d}\nu(x)=\int e^{-\frac{r}{2}(x-z)^{2}}\mathrm{d}\mu(x)

for almost all z𝑧z. The above expressions are convolutions of the measures ν𝜈\nu and μ𝜇\mu against the Gaussian kernel. By taking the Fourier transform on both sides and using Fubini’s theorem, we get equality of the characteristic functions of μ𝜇\mu and ν𝜈\nu: for all ξ𝜉\xi\in\mathbb{R},

e𝐢ξxdν(x)=e𝐢ξxdμ(x).superscript𝑒𝐢𝜉𝑥differential-d𝜈𝑥superscript𝑒𝐢𝜉𝑥differential-d𝜇𝑥\int e^{\mathbf{i}\xi x}\mathrm{d}\nu(x)=\int e^{\mathbf{i}\xi x}\mathrm{d}\mu(x).

This is because the Fourier transform of the Gaussian (another Gaussian) vanishes nowhere on the real line, thus it can be simplified on both sides. This of course implies that ν=μ𝜈𝜇\nu=\mu, and concludes our proof.  

Acknowledgments. AE is grateful to Léo Miolane for insightful conversations. This research effort was initiated at the Workshop on Statistical physics, Learning, Inference and Networks at Ecole de Physique des Houches, winter 2017. FK acknowledges funding from the EU (FP/2007-2013/ERC grant agreement 307087-SPARCS). MJ acknowledges the support of the Mathematical Data Science program of the Office of Naval Research under grant number N00014-15-1-2670.

References

  • Aizenman et al., (1987) Aizenman, M., Lebowitz, J. L., and Ruelle, D. (1987). Some rigorous results on the Sherrington–Kirkpatrick spin glass model. Communications in Mathematical Physics, 112(1):3–20.
  • Aizenman et al., (2003) Aizenman, M., Sims, R., and Starr, S. L. (2003). Extended variational principle for the Sherrington-Kirkpatrick spin-glass model. Physical Review B, 68(21):214403.
  • Aleskandrov, (1939) Aleskandrov, A. (1939). Almost everywhere existence of the second differential of a convex function and some properties of convex functions. Leningrad Univ. Ann., 37:3–35.
  • Amini and Wainwright, (2009) Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Annals of Statistics, 37(5B):2877–2921.
  • Baik et al., (2005) Baik, J., Arous, G. B., Péché, S., et al. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability, 33(5):1643–1697.
  • Baik and Lee, (2016) Baik, J. and Lee, J. O. (2016). Fluctuations of the free energy of the spherical Sherrington–Kirkpatrick model. Journal of Statistical Physics, 165(2):185–224.
  • Baik and Lee, (2017) Baik, J. and Lee, J. O. (2017). Fluctuations of the free energy of the spherical Sherrington–Kirkpatrick model with ferromagnetic interaction. In Annales Henri Poincaré, volume 18, pages 1867–1917. Springer.
  • Baik and Silverstein, (2006) Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis, 97(6):1382–1408.
  • Banks et al., (2017) Banks, J., Moore, C., Vershynin, R., Verzelen, N., and Xu, J. (2017). Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization. In IEEE International Symposium on Information Theory (ISIT), pages 1137–1141. IEEE.
  • Barbier et al., (2016) Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T., and Zdeborová, L. (2016). Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. In Advances in Neural Information Processing Systems (NIPS), pages 424–432.
  • Benaych-Georges and Nadakuditi, (2011) Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494–521.
  • Berthet and Rigollet, (2013) Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Annals of Statistics, 41(4):1780–1815.
  • Boucheron et al., (2013) Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford university press.
  • Capitaine et al., (2009) Capitaine, M., Donati-Martin, C., and Féral, D. (2009). The largest eigenvalues of finite rank deformation of large wigner matrices: convergence and nonuniversality of the fluctuations. Annals of Probability, pages 1–47.
  • Comets and Neveu, (1995) Comets, F. and Neveu, J. (1995). The Sherrington-Kirkpatrick model of spin glasses and stochastic calculus: the high temperature case. Communications in Mathematical Physics, 166(3):549–564.
  • Deshpande et al., (2016) Deshpande, Y., Abbe, E., and Montanari, A. (2016). Asymptotic mutual information for the binary stochastic block model. In IEEE International Symposium on Information Theory (ISIT), pages 185–189. IEEE.
  • Deshpande and Montanari, (2014) Deshpande, Y. and Montanari, A. (2014). Information-theoretically optimal sparse PCA. In IEEE International Symposium on Information Theory (ISIT), pages 2197–2201. IEEE.
  • Dobriban, (2017) Dobriban, E. (2017). Sharp detection in PCA under correlations: all eigenvalues matter. Annals of Statistics, 45(4):1810–1833.
  • Féral and Péché, (2007) Féral, D. and Péché, S. (2007). The largest eigenvalue of rank one deformation of large Wigner matrices. Communications in Mathematical Physics, 272(1):185–228.
  • Franz and Parisi, (1995) Franz, S. and Parisi, G. (1995). Recipes for metastable states in spin glasses. Journal de Physique I, 5(11):1401–1415.
  • Guerra, (2001) Guerra, F. (2001). Sum rules for the free energy in the mean field spin glass model. Fields Institute Communications, 30:161–170.
  • Guerra, (2003) Guerra, F. (2003). Broken replica symmetry bounds in the mean field spin glass model. Communications in Mathematical Physics, 233(1):1–12.
  • (23) Guerra, F. and Toninelli, F. L. (2002a). Central limit theorem for fluctuations in the high temperature region of the Sherrington–Kirkpatrick spin glass model. Journal of Mathematical Physics, 43(12):6224–6237.
  • (24) Guerra, F. and Toninelli, F. L. (2002b). The thermodynamic limit in mean field spin glass models. Communications in Mathematical Physics, 230(1):71–79.
  • Johnstone, (2001) Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, pages 295–327.
  • Johnstone and Lu, (2009) Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682–693.
  • Korada and Macris, (2009) Korada, S. B. and Macris, N. (2009). Exact solution of the gauge symmetric p-spin glass model on a complete graph. Journal of Statistical Physics, 136(2):205–230.
  • Krzakala et al., (2016) Krzakala, F., Xu, J., and Zdeborová, L. (2016). Mutual information in rank-one matrix estimation. In Information Theory Workshop (ITW), pages 71–75. IEEE.
  • Le Cam, (1960) Le Cam, L. (1960). Locally Asymptotically Normal Families of Distributions. Certain Approximations to Families of Distributions and Their Use in the Theory of Estimation and Testing Hypotheses. Berkeley & Los Angeles.
  • Lelarge and Miolane, (2016) Lelarge, M. and Miolane, L. (2016). Fundamental limits of symmetric low-rank matrix estimation. arXiv preprint arXiv:1611.03888.
  • Lesieur et al., (2015) Lesieur, T., Krzakala, F., and Zdeborová, L. (2015). Phase transitions in sparse PCA. In IEEE International Symposium on Information Theory (ISIT), pages 1635–1639. IEEE.
  • Lesieur et al., (2017) Lesieur, T., Krzakala, F., and Zdeborová, L. (2017). Constrained low-rank matrix estimation: Phase transitions, approximate message passing and applications. arXiv preprint arXiv:1701.00858.
  • Mézard et al., (1990) Mézard, M., Parisi, G., and Virasoro, M.-A. (1990). Spin glass theory and beyond. World Scientific Publishing.
  • Nadler, (2008) Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Annals of Statistics, pages 2791–2817.
  • Nishimori, (2001) Nishimori, H. (2001). Statistical physics of spin glasses and information processing: an introduction, volume 111. Clarendon Press.
  • Onatski et al., (2013) Onatski, A., Moreira, M. J., and Hallin, M. (2013). Asymptotic power of sphericity tests for high-dimensional data. Annals of Statistics, 41(3):1204–1231.
  • Onatski et al., (2014) Onatski, A., Moreira, M. J., and Hallin, M. (2014). Signal detection in high dimension: The multispiked case. Annals of Statistics, 42(1):225–254.
  • Paul, (2007) Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, pages 1617–1642.
  • Péché, (2006) Péché, S. (2006). The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probability Theory and Related Fields, 134(1):127–173.
  • Péché, (2014) Péché, S. (2014). Deformed ensembles of random matrices. In Proceedings of the International Congress of Mathematicians, Seoul, pages 1059–1174. ICM.
  • Perry et al., (2016) Perry, A., Wein, A. S., Bandeira, A. S., and Moitra, A. (2016). Optimality and sub-optimality of PCA for spiked random matrices and synchronization. arXiv preprint arXiv:1609.05573.
  • Talagrand, (2006) Talagrand, M. (2006). The Parisi formula. Annals of Mathematics, pages 221–263.
  • (43) Talagrand, M. (2011a). Mean field models for spin glasses. Volume I: Basic examples, volume 54. Springer Science & Business Media.
  • (44) Talagrand, M. (2011b). Mean field models for spin glasses. Volume II: Advanced replica-symmetry and low temperature, volume 55. Springer Science & Business Media.
  • Van der Vaart, (2000) Van der Vaart, A. W. (2000). Asymptotic statistics (Cambridge series in statistical and probabilistic mathematics). Cambridge University Press.
  • van Handel, (2014) van Handel, R. (2014). Probability in high dimension. Technical report, Princeton University, NJ.